Absolute and Weighted Frequency of Words in Text

Absolute and Weighted Frequency of Words in Text

https://github.com/eliasdabbas/word_frequency/blob/master/abs_weighted_frequency.ipynb

An important set of metrics in text mining relates to the frequency of words (or any token) in a certain corpus of text documents. However, you can also use an additional set of metrics in cases where each document has an associated numeric value describing a certain attribute of the document.

Some examples:

Tweets and their respective number of engagements.
URLs and their pageviews and bounces.
Movie titles and their gross revenue.
Keywords and their impressions, clicks, and conversions.
In this tutorial,

You will first go through the process of creating a simple function that calculates and compares the absolute and weighted occurrence of words in a corpus of documents. This can sometimes uncover hidden trends and aggregates that aren't necessarily clear by looking at the top ten or so values. They can often be different from the absolute word frequency as well.
Then, you will see a real-life data set (movie titles and the gross revenue), and hope to discover hidden trends. A teaser: love will come up somehow!
You will be using Python as a programming language and use the collections module's defaultdict data structure for the heavy lifting, as well as pandas DataFrames to manage the final output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Absolute and Weighted Frequency of Words in Text #101

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Absolute and Weighted Frequency of Words in Text #101

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions