You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I've found some bugs with how the <pad> token is used during training and evaluation.
First, when computing cross-entropy loss on the training and validation sets with F.cross_entropy, the padding index is not ignored. So, the model is trained to output some number of <pad>s after <eos> in some batches, and it is expected to learn to output infinite <pad>s after <eos> on the validation set. As I understand it there are relatively few batches that contain <pad> symbols due to the use of torchtext.data.BucketIterator. But the amount the model is trained and penalized on <pad> tokens depends on arbitrary batching decisions, which I think could lead to an exposure bias problem.
IMO the more problematic bug is that the padding index is not ignored when computing full-sentence accuracy. If the model does not output all <pad>s after <eos>, it does not count, and the likelihood of this happening depends on how the data is batched. I think this might partly explain why the full-sentence accuracy scores are 0 for the baselines in Mueller et al. (2022).
The text was updated successfully, but these errors were encountered:
bdusell
added a commit
to bdusell/transductions
that referenced
this issue
Jun 10, 2022
I think I've found some bugs with how the
<pad>
token is used during training and evaluation.First, when computing cross-entropy loss on the training and validation sets with
F.cross_entropy
, the padding index is not ignored. So, the model is trained to output some number of<pad>
s after<eos>
in some batches, and it is expected to learn to output infinite<pad>
s after<eos>
on the validation set. As I understand it there are relatively few batches that contain<pad>
symbols due to the use oftorchtext.data.BucketIterator
. But the amount the model is trained and penalized on<pad>
tokens depends on arbitrary batching decisions, which I think could lead to an exposure bias problem.IMO the more problematic bug is that the padding index is not ignored when computing full-sentence accuracy. If the model does not output all
<pad>
s after<eos>
, it does not count, and the likelihood of this happening depends on how the data is batched. I think this might partly explain why the full-sentence accuracy scores are 0 for the baselines in Mueller et al. (2022).The text was updated successfully, but these errors were encountered: