Description
Absolute and Weighted Frequency of Words in Text
https://github.com/eliasdabbas/word_frequency/blob/master/abs_weighted_frequency.ipynb
An important set of metrics in text mining relates to the frequency of words (or any token) in a certain corpus of text documents. However, you can also use an additional set of metrics in cases where each document has an associated numeric value describing a certain attribute of the document.
Some examples:
Tweets and their respective number of engagements.
URLs and their pageviews and bounces.
Movie titles and their gross revenue.
Keywords and their impressions, clicks, and conversions.
In this tutorial,
You will first go through the process of creating a simple function that calculates and compares the absolute and weighted occurrence of words in a corpus of documents. This can sometimes uncover hidden trends and aggregates that aren't necessarily clear by looking at the top ten or so values. They can often be different from the absolute word frequency as well.
Then, you will see a real-life data set (movie titles and the gross revenue), and hope to discover hidden trends. A teaser: love will come up somehow!
You will be using Python as a programming language and use the collections module's defaultdict data structure for the heavy lifting, as well as pandas DataFrames to manage the final output.