Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

What are the different categories of missing data?

Updated: October 3, 2023

Missing Completely at Random (MCAR): If data is missing completely at random, there is nothing systemic about the missing values and it is probably safe to use a simple imputation technique such as the mean of the data or just exclude the observations with missing data entirely. Mathematically, it is assumed that the missing observations and the complete observations are drawn from the same underlying distribution.
Missing at Random (MAR): In this category of missing data, it is no longer the case that the missing observations come from the same distribution as the complete observations. Thus, the missingness can be considered a function of an observed attribute within the data. For example, if a researcher at a university is using Gender and SAT Score to predict 1st Year GPA, and women are more likely to take the SAT than men, this data is considered the MAR case. As the missing data can introduce bias to the results, it might be necessary to adjust for the attribute that is believed to be correlated with the missingness.
Missing Not at Random (MNAR): In this case, the missing data is believed to be systematically associated with data that is not observed or collected. In the example of predicting 1st Year GPA, if students from lower socioeconomic brackets were less likely to take the SAT, this would be an example of MNAR. Thus, it is possible that the missing data can bias any conclusions reached. However, there is no simple weighting adjustment that can be made to an independent variable that can undo the bias.

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com

What are the different categories of missing data?

Author

Leave the first comment (Cancel Reply)

Other Questions in Feature Engineering