What is IDF? What do we need IDF?

Inverse Document Frequency builds upon Term Frequency by inversely weighting words that appear frequently across all of the documents. Thus, it diminishes the importance given to words that are common in general rather than to one specific document. The product of Term Frequency and Inverse Document Frequency results in a TF-IDF score, which is usually the preprocessing done before performing text classification. If a word has a high Term Frequency in a given document, it just means that it appears frequently in that document, but if it has a high TF-IDF score, it is a better measure that it is important to that document relative to the entire corpus. 

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute