If the vocabulary size is small, and the binary occurrence of given words are strong features in differentiating between document classes, simply using dummy variables to indicate the presence of given words could be a viable and less computationally intensive approach to text classification.
In what cases (and why) does using Binary Occurrence instead of TF-IDF makes more sense?
If the vocabulary size is small, and the binary occurrence of given words are strong features in differentiating between document classes, simply using dummy variables to indicate the presence of given words could be a viable and less computationally intensive approach to text classification.
Help us improve this post by suggesting in comments below:
– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic
Partner Ad