Skip to main content Skip to footer
  • Home
  • Interview Questions
    • Machine Learning Basics
    • Deep Learning
    • Supervised Learning
    • Unsupervised Learning
    • Natural Language Processing
    • Statistics
    • Data Preparation
  • Technical Quizzes
  • Jobs
  • Home
  • Interview Questions
    • Machine Learning Basics
    • Deep Learning
    • Supervised Learning
    • Unsupervised Learning
    • Natural Language Processing
    • Statistics
    • Data Preparation
  • Technical Quizzes
  • Jobs
LoginSign Up
Explore Questions by Topics
  • Computer Vision (1)
  • Generative AI (2)
  • Machine Learning Basics (18)
  • Deep Learning (52)
    • DL Basics (16)
    • DL Architectures (17)
      • Feedforward Network / MLP (2)
      • Sequence models (6)
      • Transformers (9)
    • DL Training and Optimization (17)
  • Natural Language Processing (27)
    • NLP Data Preparation (18)
  • Supervised Learning (115)
    • Regression (41)
      • Linear Regression (26)
      • Generalized Linear Models (9)
      • Regularization (6)
    • Classification (70)
      • Logistic Regression (10)
      • Support Vector Machine (9)
      • Ensemble Learning (24)
      • Other Classification Models (9)
      • Classification Evaluations (9)
  • Unsupervised Learning (55)
    • Clustering (37)
      • Distance Measures (9)
      • K-Means Clustering (9)
      • Hierarchical Clustering (3)
      • Gaussian Mixture Models (5)
      • Clustering Evaluations (6)
    • Dimensionality Reduction (9)
  • Statistics (34)
  • Data Preparation (35)
    • Feature Engineering (30)
    • Sampling Techniques (5)

NLP Data Preparation

  • Q. What are Embeddings?
  • Q. What is Bag-of-Words Model? Explain using an example
  • Q. What are some use cases of Bag of Words model?
  • Q. What are the advantages and disadvantages of Bag-of-Words model?
  • Q. What is an N-gram Language model? Explain its working in detail
  • Q. What are the Advantages/Disadvantages of a n-gram model
  • Q. What is Lemmatization?
  • Q. What is Term Frequency (TF)? 
  • Q. What is IDF? What do we need IDF?
  • Q. What is tokenization?
  • Q. What is a Vector Space Model?
  • Q. What is Vector Normalization? How is that useful? 
  • Q. How to identify Stop Words?
  • Q. What is the problem with using a generic list of stop words? 
  • Q. In what cases (and why) does using Binary Occurrence instead of TF-IDF makes more sense? 
  • Q. What happens to new words that appear in Test dataset but are not present in Training Data?
  • Q. What is Laplace Smoothing? What is Additive Smoothing? Why do we need smoothing in IDF?
  • Q. What is meant by Corpus and Vocabulary in Natural Language Processing?
Partner Ad
Explore Questions by Topics
  • Computer Vision (1)
  • Generative AI (2)
  • Machine Learning Basics (18)
  • Deep Learning (52)
    • DL Basics (16)
    • DL Architectures (17)
      • Feedforward Network / MLP (2)
      • Sequence models (6)
      • Transformers (9)
    • DL Training and Optimization (17)
  • Natural Language Processing (27)
    • NLP Data Preparation (18)
  • Supervised Learning (115)
    • Regression (41)
      • Linear Regression (26)
      • Generalized Linear Models (9)
      • Regularization (6)
    • Classification (70)
      • Logistic Regression (10)
      • Support Vector Machine (9)
      • Ensemble Learning (24)
      • Other Classification Models (9)
      • Classification Evaluations (9)
  • Unsupervised Learning (55)
    • Clustering (37)
      • Distance Measures (9)
      • K-Means Clustering (9)
      • Hierarchical Clustering (3)
      • Gaussian Mixture Models (5)
      • Clustering Evaluations (6)
    • Dimensionality Reduction (9)
  • Statistics (34)
  • Data Preparation (35)
    • Feature Engineering (30)
    • Sampling Techniques (5)
Join us on:
  • Machine Learning Interview Preparation Group
  • @OfficialAIML
Find out all the ways that you can
Contribute
Other Questions in NLP Data Preparation
  • What are options to calibrate probabilities produced from the output of a classifier that does not produce natural probabilities?
  • What are the subtypes of Cross Validation?
  • What is Specificity?
  • Explain the concept and working of the Random Forest model
  • How does a learning curve give insight into whether the model is under- or over-fitting?
  • What is the difference between Discriminative and Generative models?
© 2025 AIML.COM  |  ♥ Sunnyvale, California