Dataset : Click here to view the dataset
The dataset provided consists of roughly 10,000 unlabeled garage reviews. You can check out the Data Analysis notebook for a basic analysis of the data.
The dataset provided has unlabeled data. The task at hand is to classify this data into different topics. Now the first thing that needs to be done is to segregate the data into various topic clusters. This is an unsupervised topic modeling problem and we will make use of the Latent Dirichlet Allocation (LDA) algorithm to generate topic clusters for our data.
The LDA algorithm has been implemented in the LDA Model notebook.
Predictions were made by carefully mapping the topics from the evaluation_labels.MD file to the LDA generated topics. The Prediction notebook consists of a brief explanation regarding the prediction and ideas employed.
Clone the repo and navigate to it
git clone https://github.com/praatibhsurana/Unsupervised-Topic-Modeling.git
cd Unsupervised-Topic-Modeling
Install all requirements using pip and open up Jupyter notebook
pip install -r requirements.txt
jupyter notebook
- https://towardsdatascience.com/evaluate-topic-model-in-python-latent-dirichlet-allocation-lda-7d57484bb5d0
- https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0
- https://www.tutorialspoint.com/gensim/gensim_creating_lda_topic_model.htm#:~:text=Role%20of%20LDA&text=Every%20topic%20is%20modeled%20as,from%20a%20mixture%20of%20topics.
- https://radimrehurek.com/gensim/models/ldamodel.html
- https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
- https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/
- https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd
- https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2