What problems would arise from using a regular linear regression to model a binary outcome?

  • Predicted values would not be constrained to the range of [0,1], resulting in predictions that are not valid probabilities. 
  • The outcome only takes on values of 0 and 1, and thus it is not a continuous, normal distribution. Thus, a regression line would seek to fit something that connects the observations with a label of 0 to those having a label of 1, which is a poor representation for such a setup.
  • Since the theoretical variance of a Bernoulli distribution is p(1-p), it is a function of the mean, and thus it does not exhibit constant variance (one of the assumptions for linear regression model to hold true)

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute