What is Jaccard Index / Distance?

The Jaccard Index measures similarity for two sets of data by computing the ratio of items present in both sets (the intersection) to the total number of distinct items present in either set (the union). As a larger Jaccard Index indicates more similar sets, it can be converted to a distance metric by subtracting the index from 1. It is also a measure that is commonly used in measuring text similarity, as documents can be decomposed into sets based on the words they contain. For two sets X1 and X2, the Jaccard Distance is given by

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad