Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

Among the common machine learning algorithms, which require feature scaling, and which do not?

Updated: October 4, 2023

As a general rule of thumb, if any component of the objective function of the algorithm involves a distance measure, either between observations or to a central location, the data should be scaled before training the algorithm. If the algorithm is rule-based, such as a decision tree, scaling is not necessary. However, even if there is not an explicit need to do so, it is never necessarily wrong to scale the data, but the scale should be noted when it comes to interpretation. Using this heuristic, the following is a (non-exhaustive) mapping of where some of the most common algorithms fit in this regard.

Scaling is Necessary

Neural Networks (more so to aid in convergence of gradient descent optimizer)
Regularized Regression (Ridge, LASSO, Elastic Net, etc.)
Support Vector Machine
K-Nearest Neighbors
K-Means
Dimensionality Reduction (PCA, Factor Analysis)

Scaling is Not Necessary

Ordinary Regression (regular Linear, GLM regression w/o regularization)
- However, if optimization is done using gradient descent, scaling data helps in convergence.
Decision Tree Methods (CART, Random Forest, GBM, etc.)
Naive Bayes

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com

Among the common machine learning algorithms, which require feature scaling, and which do not?

Author

Leave the first comment (Cancel Reply)

Other Questions in Feature Engineering