You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal is to break text documents down into topics by word and to experience how topics are modelled with different appraches. We want to find “topics” that are collections of words that appear in similar documents
There are 2 popular libraries for LDA/LSAsuch as scikit-learn and gensim. I choose gensim for this project.
Install the latest version of libraries in requirements and dependencies
Run get_historical_news.py to collect 500 latest news : python get_historical_news.py
Comment Colab Setup and change data path in notebooks
Run model_preparation.ipynb to produce the data
Run Topic Modeling-LDA.ipynb for LDA topic modeling
Run Topic Modeling-LSA.ipynb for LSA topic modeling
About
Retrieving real time breaking news from https://www.reuters.com/breakingviews and building topic modeling using Latent Dirichlet Allocation and Latent Semantic Analysis