Skip to content

Goyam02/movie_recommend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Movie Recommendation System (Content-Based)

Streamlit HuggingFace Python License

A lightweight content-based movie recommendation system built using TMDB metadata. It extracts key movie features, processes them into vectors, computes similarity scores, and serves the recommendations through an interactive Streamlit UI.


πŸš€ Features

  • Content-based filtering using cosine similarity

  • Metadata extraction:

    • Genres
    • Keywords
    • Top 3 cast members
    • Director
    • Overview
  • Stemming of normalized text (NLTK)

  • Vectorization via CountVectorizer

  • Fast, precomputed similarity matrix

  • Supports downloading large models from Hugging Face

  • Clean, interactive Streamlit interface


πŸ“ Project Structure

.
β”œβ”€β”€ app.py                   # Streamlit UI application
β”œβ”€β”€ main.ipynb               # Data preprocessing + similarity computation
β”œβ”€β”€ movies.pkl               # Cleaned movie metadata
β”œβ”€β”€ requirements.txt         # Dependencies for app & notebook
β”œβ”€β”€ runtime.txt              # Python version (for Streamlit Cloud)
β”œβ”€β”€ README.md                # Documentation
└── data/
    β”œβ”€β”€ tmdb_5000_movies.csv
    └── tmdb_5000_credits.csv

🧠 System Architecture

 TMDB CSVs
    β”‚
    β–Ό
 Data Cleaning & Feature Extraction (main.ipynb)
    β”‚
    β”œβ”€β”€ create movies.pkl
    └── compute similarity matrix β†’ similarity.pkl
    β”‚
    β–Ό
 Streamlit App (app.py)
    β”‚
    β”œβ”€β”€ load movies.pkl
    β”œβ”€β”€ load similarity from local OR Hugging Face
    └── recommend top similar movies

πŸ› οΈ Local Setup

1️⃣ Create a virtual Environment

python -m venv venv

2️⃣ Activate the Environment

source venv/Scripts/activate  # Windows
source venv/bin/activate      # Macos 

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the Streamlit App

streamlit run app.py

Your app will open at:

πŸ‘‰ http://localhost:8501


πŸ§ͺ Regenerating Artifacts (optional)

Open and run:

main.ipynb

This notebook:

  • Cleans the TMDB dataset

  • Creates a tags column

  • Computes cosine similarity

  • Saves artifacts:

    • movies.pkl
    • similarity.pkl or similarity.npz (depending on your notebook code)

☁️ Deploying on Streamlit Cloud

1. Push the repo to GitHub

(NO large model files like similarity.pkl β€” use HF for that)

2. Add these two files:

runtime.txt

python-3.10.12

requirements.txt (example)

streamlit
numpy==1.25.3
pandas
scikit-learn
joblib
requests

3. Ensure app.py contains your HF URL:

HF_RAW_URL = "https://huggingface.co/<username>/<repo>/resolve/main/similarity.pkl"

4. Deploy

Go to: https://share.streamlit.io

  • New App
  • Choose your GitHub repo
  • Branch: main
  • Entry point: app.py
  • Deploy πŸŽ‰

Streamlit will download the model from Hugging Face on first run.


🌐 Hosting the similarity model on Hugging Face

Upload similarity.pkl to your HF repo.

Use the raw URL:

https://huggingface.co/<username>/<repo>/resolve/main/similarity.pkl

⚠️ Do NOT use the blob link β€” it won’t work. Use /resolve/main/ or /raw/main/.


🧠 Recommendation Logic

movie_index = movies[movies['title'] == movie].index[0]
distances = similarity[movie_index]
movie_list = sorted(
    list(enumerate(distances)),
    reverse=True,
    key=lambda x: x[1]
)[1:6]
return [movies.iloc[i[0]].title for i in movie_list]

❗ Common Issues & Fixes

πŸ”₯ GitHub rejecting large files?

GitHub doesn’t allow >100MB. Solution:

  • Upload large similarity file to Hugging Face
  • Let app.py download it at runtime

πŸ”₯ Streamlit build failing on numpy?

Use a wheel-friendly version:

numpy==1.25.3

Add runtime.txt:

python-3.10.12

πŸ”₯ Using private Hugging Face files?

Add a token to Streamlit Secrets.


🌟 Future Enhancements

  • TMDB poster integration
  • Movie detail pages
  • Hybrid recommender (Content + Collaborative Filtering)
  • Semantic similarity with Sentence Transformers
  • Compressed sparse similarity matrix

❀️ Acknowledgements

  • TMDB for the dataset
  • Streamlit for the UI
  • Hugging Face for large file hosting
  • Scikit-learn & Pandas for preprocessing

About

A lightweight content-based movie recommendation system built using TMDB metadata. It extracts key movie features, processes them into vectors, computes similarity scores, and serves the recommendations through an interactive Streamlit UI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors