Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering train set in test_avg_metrics #20

Open
bkj opened this issue Jun 15, 2018 · 5 comments
Open

Filtering train set in test_avg_metrics #20

bkj opened this issue Jun 15, 2018 · 5 comments

Comments

@bkj
Copy link

bkj commented Jun 15, 2018

Hi all --

In some other recommender systems, there's a flag to filter the items in the training set from the test metrics -- is there something like that in qmf?

That is, it doesn't make sense to compute p@k on the test set if we allow the top-k predictions to contain items that we observed in the train set, and therefore know won't appear in the test set.

Thanks

@albietz
Copy link
Contributor

albietz commented Jun 15, 2018

Hi @bkj

I'm not sure I understand precisely what you're asking, but it wouldn't make sense in matrix factorization models to use items in test metrics that do not appear in training, since then you wouldn't have any latent factors to make test predictions.

Instead, the test dataset should contain (user, item) pairs that do not appear in training data, and evaluation considers ranking metrics per user on this subset of the data (filtering out all rows where user / item did not appear in the training data). The metrics are averaged over all test users, and there's an option to use a smaller number of test users, since this can be costly when there are many users.

Hope this helps.

Alberto

@bkj
Copy link
Author

bkj commented Jun 15, 2018 via email

@albietz
Copy link
Contributor

albietz commented Jun 15, 2018

Gotcha, I wasn't aware of this optimization. Do you have pointers to papers/implementations discussing this?

-Alberto

@bkj
Copy link
Author

bkj commented Jun 15, 2018

No papers off the top of my head, but I know they do it in dsstne (and probably other places). A script to do the filtering is here.

On the example I'm running (movielen-1m), doing this filtering increases p@10 from ~0.1 to ~0.25 -- so it's a nontrivial improvement and I think the right way to do evaluation.

~ Ben

@albietz
Copy link
Contributor

albietz commented Jun 15, 2018

Hmm this might be worth including, but at the same time I'm not convinced that it's the right way to do evaluation either, e.g. it might artificially boost the p@k of different users in different ways depending on how many positive items appear for the user (because you would only filter positive items, and not negatives that the user may have seen).

I'd be curious to know if there is a way to estimate P@k on held-out data that is theoretically justified.

-Alberto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants