Multicollinearity occurs when one or more predictors experience high correlation between each other. This creates a problem for the model to be able to precisely estimate the coefficients, as it has difficulty honing on the exact effect of each variable. Signs of multicollinearity include a significant overall model but no individual significant coefficients and unexpected coefficient signs, such as a negative sign on a variable that clearly should have a positive association with the target. Multicollinearity can be detected by examining a correlation matrix of the predictors and identifying pairs of predictors with high correlation, such as 0.7 or above. It can also be measured through the Variance Inflation Factors (VIFs), of which high values (especially 10 or above) indicate variables that are causing an issue with multicollinearity.
What is multicollinearity and how can that be identified?
Help us improve this post by suggesting in comments below:
– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic
Partner Ad