What are the different categories of missing data?

  • Missing Completely at Random (MCAR): If data is missing completely at random, there is nothing systemic about the missing values and it is probably safe to use a simple imputation technique such as the mean of the data or just exclude the observations with missing data entirely. Mathematically, it is assumed that the missing observations and the complete observations are drawn from the same underlying distribution. 
  • Missing at Random (MAR): In this category of missing data, it is no longer the case that the missing observations come from the same distribution as the complete observations. Thus, the missingness can be considered a function of an observed attribute within the data. For example, if a researcher at a university is using Gender and SAT Score to predict 1st Year GPA, and women are more likely to take the SAT than men, this data is considered the MAR case. As the missing data can introduce bias to the results, it might be necessary to adjust for the attribute that is believed to be correlated with the missingness. 
  • Missing Not at Random (MNAR): In this case, the missing data is believed to be systematically associated with data that is not observed or collected. In the example of predicting 1st Year GPA, if students from lower socioeconomic brackets were less likely to take the SAT, this would be an example of MNAR. Thus, it is possible that the missing data can bias any conclusions reached. However, there is no simple weighting adjustment that can be made to an independent variable that can undo the bias.

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute