Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is eval_num_neg implemented? #7

Open
ghost opened this issue Oct 7, 2016 · 5 comments
Open

Is eval_num_neg implemented? #7

ghost opened this issue Oct 7, 2016 · 5 comments

Comments

@ghost
Copy link

ghost commented Oct 7, 2016

According to the documentation:

--eval_num_neg (default 3): number of random negatives per positive used to generate the fixed evaluation sets mentioned above

However, I see it only once in the code, with a DEFINE
./qmf/bpr.cpp:42:DEFINE_uint64(eval_num_neg, 3, "number of negatives generated per positive in evaluation");

What is the purpose of this flag?

@albietz
Copy link
Contributor

albietz commented Oct 14, 2016

Hi.

The flag is passed to the BPREngine constructor (https://github.com/quora/qmf/blob/master/qmf/bpr.cpp#L101). Since the training objective consists of expectations over all pairs of items, computing these at each epoch is expensive, so instead we evaluate on a (fixed) monte-carlo approximation of the expectations, where eval_num_neg is the number of negative items sampled for each positive item.

@ghost
Copy link
Author

ghost commented Oct 14, 2016

Agreed. However, what is the purpose of --eval_num_neg?

When evaluating the test data, the number of negative examples should not matter. Or is it used for something else? I'm not sure why there are two parameters about negative examples.

--eval_num_neg
--num_negative_samples

@albietz
Copy link
Contributor

albietz commented Oct 14, 2016

num_negative_samples is used at training time when sampling negative examples in each SGD iteration. eval_num_neg is simply used once in the beginning to generate training and test evaluation sets for computing approximate training and test losses (note that it isn't used in the computation of other average test metrics like auc/ap/etc).
These loss values are mainly useful to check convergence (training loss) and overfitting (test loss), since they approximately correspond to what the algorithm is optimizing, while the other average metrics are more of a final measure of how the algorithm is doing.
Leaving eval_num_neg = 3 is a reasonable choice in most settings. (in contrast, changing num_negative_samples can have some impact on the final results or the convergence speed)

@ghost
Copy link
Author

ghost commented Oct 14, 2016

Thank you for clarifying. I'm using precision and recall, so eval_num_neg should have no effect.

Thank you,

@albietz
Copy link
Contributor

albietz commented Oct 15, 2016

Indeed. Thanks for raising the issue - I've added some clarifications in the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant