Padding token is not ignored during training and evaluation #64

bdusell · 2022-06-10T01:56:20Z

I think I've found some bugs with how the <pad> token is used during training and evaluation.

First, when computing cross-entropy loss on the training and validation sets with F.cross_entropy, the padding index is not ignored. So, the model is trained to output some number of <pad>s after <eos> in some batches, and it is expected to learn to output infinite <pad>s after <eos> on the validation set. As I understand it there are relatively few batches that contain <pad> symbols due to the use of torchtext.data.BucketIterator. But the amount the model is trained and penalized on <pad> tokens depends on arbitrary batching decisions, which I think could lead to an exposure bias problem.

IMO the more problematic bug is that the padding index is not ignored when computing full-sentence accuracy. If the model does not output all <pad>s after <eos>, it does not count, and the likelihood of this happening depends on how the data is batched. I think this might partly explain why the full-sentence accuracy scores are 0 for the baselines in Mueller et al. (2022).

The text was updated successfully, but these errors were encountered:

Fixes clay-lab/transductions clay-lab#64

bdusell added a commit to bdusell/transductions that referenced this issue Jun 10, 2022

Ignore <pad> when computing cross-entropy loss and sequence accuracy.

565423d

Fixes clay-lab/transductions clay-lab#64

bdusell mentioned this issue Jun 10, 2022

Ignore <pad> when computing cross-entropy loss and sequence accuracy. #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Padding token is not ignored during training and evaluation #64

Padding token is not ignored during training and evaluation #64

bdusell commented Jun 10, 2022

Padding token is not ignored during training and evaluation #64

Padding token is not ignored during training and evaluation #64

Comments

bdusell commented Jun 10, 2022