Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

How is clustering affected by high-dimensional data, and how can the quality of clusters generated be improved in such cases?

Updated: March 26, 2023

One problem of performing clustering in high-dimensional data is that common distance metrics, such as Euclidean distance, do not perform as well as the number of dimensions becomes large. This is one of the issues caused by the Curse of Dimensionality, as the distance between any pair of points becomes less distinguishable as the dimensionality increases. One approach for dealing with high-dimensional data is to first reduce the dimensionality through a technique like PCA and then perform clustering on the principal components rather than the original data. Alternatively, algorithms such as DBSCAN are better suited to K-Means for identifying clusters in high-dimensional data.

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com

How is clustering affected by high-dimensional data, and how can the quality of clusters generated be improved in such cases?

Author

Leave the first comment (Cancel Reply)

Other Questions in Unsupervised Learning