How does DBSCAN Clustering work, and in what cases is it useful?

Density-based clustering approaches, such as DBSCAN, tend to perform better than partitioning methods like K-Means when clusters are non-globular in shape or are embedded within high density regions of the data. At a high level, DBSCAN attempts to separate observations into clusters by identifying separate regions within the feature space that contain large concentrations of data points. Based on distance between data points and the minimum number of observations needed to form a cluster, the algorithm classifies points into three categories: core points, border points, or outliers. Core points include all observations within the specified radius that defines a neighborhood. Border points are those that can be reached from a core point but have less than the minimum number of observations needed to define a cluster within their surrounding area. Outliers are points that cannot be reached from any core points. Advantages of DBSCAN over K-Means are that the number of clusters does not have to be specified beforehand, as well as its ability to identify points far from any clusters as outliers.

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute