Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

How do outliers affect the clusters formed in K-Means?

Updated: March 26, 2023

Being that clustering is a distance-based algorithm, outliers can have multiple undesired effects on the quality of the clusters produced. Being the objective of K-Means is to minimize the within cluster sum of squares, or distance from each observation to the cluster’s centroid, outliers that are far from the centroids will prevent the objective from achieving a minimum compared to if they were not present. It is also possible that the presence of a small number of outliers can result in clusters that only contain a few observations, which can obscure the practical conclusions of what the clusters represent. This further emphasizes the importance of scaling the data before a clustering algorithm is trained, but even after scaling, noticeable outliers should be investigated further.

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com

How do outliers affect the clusters formed in K-Means?

Author

Leave the first comment (Cancel Reply)

Other Questions in K-Means Clustering