- Predicted values would not be constrained to the range of [0,1], resulting in predictions that are not valid probabilities.
- The outcome only takes on values of 0 and 1, and thus it is not a continuous, normal distribution. Thus, a regression line would seek to fit something that connects the observations with a label of 0 to those having a label of 1, which is a poor representation for such a setup.
- Since the theoretical variance of a Bernoulli distribution is p(1-p), it is a function of the mean, and thus it does not exhibit constant variance (one of the assumptions for linear regression model to hold true)
What problems would arise from using a regular linear regression to model a binary outcome?
Help us improve this post by suggesting in comments below:
– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic
Partner Ad