Skip to content

Model Parallelism with TrainState #1988

Answered by marcvanzee
agemagician asked this question in Q&A
Discussion options

You must be logged in to vote

There is no reason in principle why TrainState could not be used with model parallelism.

The reason why this doesn't work well (yet) in HuggingFace is, I believe, particular to their API and only applies to very large models. They are actually thinking about this in huggingface/transformers#15766.

If you are looking for a simple example that uses TrainState with pjit, this example may be useful: https://colab.sandbox.google.com/github/marcvanzee/flax/blob/pjit-example/examples/siren/siren.ipynb

@patrickvonplaten

Replies: 2 comments 9 replies

Comment options

You must be logged in to vote
2 replies
@patrickvonplaten
Comment options

@agemagician
Comment options

Answer selected by agemagician
Comment options

You must be logged in to vote
7 replies
@jheek
Comment options

@agemagician
Comment options

@jheek
Comment options

@agemagician
Comment options

@jheek
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants