Collaborative Filtering Benchmarking

This repository contains the solution for collaborative filtering tasks, implemented with Python’s Surprise library. The project includes optimizing user-based K-Nearest Neighbors (K-NN) by tuning the number of neighbors (K) to minimize Mean Absolute Error (MAE) under different levels of sparsity, addressing sparsity issues with SVD (Singular Value Decomposition) and its Funk variant, and comparing the performance of K-NN and SVD in generating Top-N recommendations with varying levels of missing ratings.

Setup

Clone the repository:

git clone [email protected]:ikajdan/collaborative_filtering_benchmarking.git
cd collaborative_filtering_benchmarking

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Download the data:

curl -O https://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip

Task 1

Task Description

Given the dataset and the algorithm of K-NN (K-Nearest Neighbors), for user-based CF (Collaborative Filtering):

Find out the value for K that minimizes the MAE (Mean Absolute Error) with 25% of missing ratings.
Sparsity problem: find out the value for K that minimizes the MAE with 75% of missing ratings.

The MAE is consistently lower for the 25% missing ratings case across all K values. This is expected, as more data generally leads to better predictive accuracy. In the case of 75% missing ratings, the algorithm struggles with sparsity, as fewer neighbors have overlapping ratings with the target user, leading to less reliable recommendations. Therefore, higher data sparsity may benefit from larger neighborhoods to aggregate more data points and reduce errors.

MAE vs. K for 25% and 75% missing ratings.

For both sparsity levels, the MAE decreases as K increases, reaching a minimum before leveling off. For this dataset the optimal K values are:

25% missing ratings: 58 with MAE equal to 0.7504.
75% missing ratings: 55 with MAE equal to 0.7885.

Task 2

Task Description

Mitigation of sparsity problem: show how SVD (Singular Value Decomposition), the Funk variant, can provide a better MAE than user-based K-NN using the provided dataset.

In both cases, 25% and 75% missing ratings, Funk SVD outperforms K-NN. This suggests that SVD is more effective in handling high sparsity, as it can better generalize from the data and make more accurate predictions. The precision, recall, and F1 scores also show that SVD performs better than K-NN in both cases, indicating that SVD is more effective in capturing relevant items and making accurate recommendations.

MAE, Precision, Recall, and F1 Score comparison between SVD and K-NN for 25% and 75% missing ratings.

Task 3

Task Description

Top-N recommendations: calculate the precision, recall, and F1 with different values for N (10 to 100) using user-based K-NN (with the best Ks) and SVD. To do this, you must suppose that the relevant recommendations for a specific user are those rated with 4 or 5 stars in the data set. Perform the calculations for both 25% and 75% of missing ratings.

Explain why you think that the results reported in the three tasks make sense.

As N increases, precision decreases while recall remains high. With higher N, the model recommends more items, which increases recall (capturing more relevant items) but reduces precision (introducing more irrelevant items). For both 25% and 75% missing ratings, recall stays high, indicating that relevant items are mostly retrieved, while precision drops with larger N. The F1 score reflects this trade-off, peaking at smaller N and decreasing as N grows, particularly under higher sparsity. Both K-NN and SVD models show similar performance, suggesting that their predictive capabilities are comparable in this scenario.

Precision, Recall, and F1 Score for K-NN with 25% and 75% missing ratings.

Precision, Recall, and F1 Score for SVD with 25% and 75% missing ratings.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
task1.py		task1.py
task2.py		task2.py
task3.py		task3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collaborative Filtering Benchmarking

Setup

Task 1

Task 2

Task 3

About

Languages

ikajdan/collaborative_filtering_benchmarking

Folders and files

Latest commit

History

Repository files navigation

Collaborative Filtering Benchmarking

Setup

Task 1

Task 2

Task 3

About

Topics

Resources

Stars

Watchers

Forks

Languages