Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT #57

Open
jopetty opened this issue Feb 3, 2021 · 2 comments
Open

BERT #57

jopetty opened this issue Feb 3, 2021 · 2 comments
Milestone

Comments

@jopetty
Copy link
Member

jopetty commented Feb 3, 2021

Would be nice to have BERT an option for the encoder. Some issues are:

  • BERT uses its own tokenizer, which probably doesn't play nicely with the Fields we've been using.
  • How do we get access to the source vocabulary
  • Do we always need to have the transformation token included in the target vocabulary, since BERT's tokenizer might do weird things to it?
@jopetty jopetty added this to the Stable 1.0 milestone Feb 3, 2021
@jopetty
Copy link
Member Author

jopetty commented Feb 12, 2021

aedca6f adds a "working" (i.e., doesn't error) BERT model, but it doesn't seem to learn very well. Among the design considerations:

  • wiring up the actual config parameters in the YAML file so they do something. Not sure how much flexibility we have with the pre-trained HuggingFace models, but at the very least we shouldn't have extraneous options
  • it seems best (for training time) to freeze the layers of the BERT encoder, but maybe this should be a user-configurable option as well?

@jopetty
Copy link
Member Author

jopetty commented Feb 19, 2021

It seems that the positional encodings built into the HuggingFace BERT models are not useful in a sequence to sequence context. I'm not really sure why this is, but it is fixable if we add our own positional encodings to the embedding layer of the pretrained models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant