Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

What is Normalization?

Updated: October 3, 2023

Normalization, also known as ‘Unit-Length Scaler’, is a ‘Feature Scaler’ that can be used when preprocessing numerical data, as we prepare our ‘Training Data’. The purpose, as with most pre-processing techniques, is to manipulate the data so that it is in a better format, ready for the predictive modeling we intend to use it for.

As we know, Features that have grossly different scales and magnitudes can adversely affect some predictive models, resulting in poor prediction capability. Especially algorithms that rely on the eucliedean distance between Data Points

So that this does not happen we can utilize a technique known as Normalization where we remove the influence of differences in scale by re-scaling each ‘Feature Vector’ to a unit length of 1. In practice this means that the Feature in question is scaled so that its minimum and maximum value sit between 0…1. This changes the absolute distances between data points but maintains the relative distances, as such it is useful when zero values exist within the data, such as in a ‘Sparse Array’.

Normalization removes the problems associated with wildly different Feature Scales and Magnitudes but comes at the cost of creating sensitivity to outliers within the data. If these are present then we can end up producing ‘Training Data’ that has data points that are bunched towards 0 or 1, as the relative distance from the Outlier Data is maintained.

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com

What is Normalization?

Author

Leave the first comment (Cancel Reply)

Other Questions in Feature Engineering