While these procedures have the advantage of being largely automated, they should be implemented with caution. There are several drawbacks of these approaches, such as the inflated type I error rate associated with multiple hypothesis testing and failure to properly address multicollinearity. Further, stepwise selection methods are computationally expensive to carry out and give a false pretense of automating the variable selection process despite providing no guarantee to achieve the best model or even one that is theoretically sound. Many modern machine learning algorithms have their own variable selection capabilities that don’t suffer from the pitfalls of these.
What are some of the problems with stepwise selection approaches?
Help us improve this post by suggesting in comments below:
– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic
Partner Ad