-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using cross entropy loss to calculate DPO? #67
Comments
I have a similar question about get the average of the losses |
|
the loss is averaged over the tokens in each sequence and then averaged over sequences=> if do like this ,has any problme? my test is not as good as sum |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think that you can replace most of the logic in
_get_batch_logps
withtorch.nn.CrossEntropyLoss
and then to get the average instead of the sum divide by the number of tokens that are not ignored. This would remove some of the bespoke code from the repo. Do you think this is a correct interpretation of the maths in the paper?The text was updated successfully, but these errors were encountered: