Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Padding token is not ignored during training and evaluation #64

Open
bdusell opened this issue Jun 10, 2022 · 0 comments
Open

Padding token is not ignored during training and evaluation #64

bdusell opened this issue Jun 10, 2022 · 0 comments

Comments

@bdusell
Copy link

bdusell commented Jun 10, 2022

I think I've found some bugs with how the <pad> token is used during training and evaluation.

First, when computing cross-entropy loss on the training and validation sets with F.cross_entropy, the padding index is not ignored. So, the model is trained to output some number of <pad>s after <eos> in some batches, and it is expected to learn to output infinite <pad>s after <eos> on the validation set. As I understand it there are relatively few batches that contain <pad> symbols due to the use of torchtext.data.BucketIterator. But the amount the model is trained and penalized on <pad> tokens depends on arbitrary batching decisions, which I think could lead to an exposure bias problem.

IMO the more problematic bug is that the padding index is not ignored when computing full-sentence accuracy. If the model does not output all <pad>s after <eos>, it does not count, and the likelihood of this happening depends on how the data is batched. I think this might partly explain why the full-sentence accuracy scores are 0 for the baselines in Mueller et al. (2022).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant