A Term Frequency matrix consists of the IDs for the documents in the corpus for the rows and all of the words in the vocabulary in the columns. A given entry in a TF matrix is interpreted as the number of occurrences of word w in document d. If the value is 0, that word does not appear in document d. In a large corpus, there will likely be many words as part of the vocabulary, so this is usually a large sparse matrix.
What is Term Frequency (TF)?
Help us improve this post by suggesting in comments below:
– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic
Partner Ad