integration with mup? #2066

nestordemeure · 2022-04-17T21:21:28Z

nestordemeure
Apr 17, 2022

Is there any interest in integrating maximal update parametrization (mup) with Flax? It is a way to alter parameters initialization and the optimizer such that the optimal value of the hyperparameters (such as the learning rate) stays stable across model size. The work is interesting in that it lets you optimize your hyperparameters on a small model then train your large model (furthermore it stabilises the parameters across training which has intriguing applications to optimizer research).

The existing implementation is based on Pytorch: they introduce a parameters initialisation function (the code pattern is similar to the one already used in Flax) and a modified optimizer.

I believe this could be a good fit for Flax as Flax is already used to train very large models (such as PaLM), relies on optax for its optimization (optax optimizers are composeable and one might introduce a wrapper to make them mup compatible) and already has a separate-weights-initialisation pattern (whereas they had to introduce it as a foreign concept in Pytorch): the port might be relatively straightforward.

I asked the mup team and they appear available and motivated to cooperate on a Flax version.

marcvanzee · 2022-04-25T09:05:54Z

marcvanzee
Apr 25, 2022
Maintainer

I think this type of cutting-edge research is really cool, and it is very exciting to see it being applied to Flax as well! I suppose the work will consist of extending their API so that it cannot just handle Pytorch modules, but Flax modules as well. We are more than happy to advice on specific questions that arise, or limitations to Flax that pop up, either filed as Github issues or Github Discussions.

Given that there does not seem to be anything actionable for our team right now, I have converted your Issue to a Discussion. Please let me know if you have any further questions, or you'd like to add anything!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration with mup? #2066

{{title}}

Replies: 1 comment

{{title}}

Select a reply

integration with mup? #2066

nestordemeure Apr 17, 2022

Replies: 1 comment

marcvanzee Apr 25, 2022 Maintainer

nestordemeure
Apr 17, 2022

marcvanzee
Apr 25, 2022
Maintainer