Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 1.41 KB

README.md

File metadata and controls

20 lines (15 loc) · 1.41 KB

Time Magazine

Time Magazine Scraper, Text Extraction (OCR), and Data Exploration with Topic Modelling

01.ipynb: Code
Open in Colab to explore the topics (and their dominant terms) or run the code.

Part 1 : Scraping from Time Vault from 1923-2015.
Scraped Data

Part 2: Text Extraction with Tesseract OCR.
Currently, the text is extracted only from 2000-2015, since the process is slow.
And yes, extracted text has lots of noise.

Part 3: Data Exploration with Topic Modelling.
TODO: For all years, and interpretation.