-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering train set in test_avg_metrics
#20
Comments
Hi @bkj I'm not sure I understand precisely what you're asking, but it wouldn't make sense in matrix factorization models to use items in test metrics that do not appear in training, since then you wouldn't have any latent factors to make test predictions. Instead, the test dataset should contain (user, item) pairs that do not appear in training data, and evaluation considers ranking metrics per user on this subset of the data (filtering out all rows where user / item did not appear in the training data). The metrics are averaged over all test users, and there's an option to use a smaller number of test users, since this can be costly when there are many users. Hope this helps. Alberto |
To compute p@k, you take the top K predictions and look at the overlap
between those predictions and the actual observed values in the test set.
However, the top K predictions usually contain elements that were observed
in the train set, so are by definition not in the test set. Usually I take
the top K predictions AFTER filtering user-items that appear in the
training set, otherwise the p@k is artificially reduced. Does that make
sense?
|
Gotcha, I wasn't aware of this optimization. Do you have pointers to papers/implementations discussing this? -Alberto |
No papers off the top of my head, but I know they do it in dsstne (and probably other places). A script to do the filtering is here. On the example I'm running (movielen-1m), doing this filtering increases p@10 from ~0.1 to ~0.25 -- so it's a nontrivial improvement and I think the right way to do evaluation. ~ Ben |
Hmm this might be worth including, but at the same time I'm not convinced that it's the right way to do evaluation either, e.g. it might artificially boost the p@k of different users in different ways depending on how many positive items appear for the user (because you would only filter positive items, and not negatives that the user may have seen). I'd be curious to know if there is a way to estimate P@k on held-out data that is theoretically justified. -Alberto |
Hi all --
In some other recommender systems, there's a flag to filter the items in the training set from the test metrics -- is there something like that in qmf?
That is, it doesn't make sense to compute p@k on the test set if we allow the top-k predictions to contain items that we observed in the train set, and therefore know won't appear in the test set.
Thanks
The text was updated successfully, but these errors were encountered: