Question about LBFGS #307

DoTulip · 2022-09-14T05:00:05Z

DoTulip
Sep 14, 2022

Hello JAXopt Team,
Thank you so much for creating such a great tool for us to use!
I'm a new user of JAXopt, and I want to use LBFGS for optimization. I noticed that there are two LBFGS invocation methods in JAXopt at the same time, that is, through the definition of jaxopt.LBFGS and the definition of jaxopt.ScipyMinimize. What is the difference between these two definitions? Which one is more suitable if I want to use LBFGS to optimize my neural network parameters? In addition, can the standard benchmark using LBFGS be supplemented so that a novice like me can learn and use it?

Answered by Algue-Rythme

Sep 26, 2022

Hi DoTulip

The tool jaxopt.ScipyMinimize is just a wrapper for Scipy - it is equivalent to calling Scipy.minimize on your function directly (same code is running hunder the hood). In particular this code is not jittable, does not benefit from GPU/TPU speed up. The only exception with Scipy is that it is actually possible to differentiate through the wrapper thanks to implicit differentiation.

The jaxopt.LBFGS is a pure re-implementation of L-BFGS in Jax: it is differentiable, run on GPU/TPU, can be wrapped in jax.jit. This should be your preferred tool if performance is an issue (this definitively what you want to use for the train_step function of your neural network).

For neural network…

View full answer

Algue-Rythme · 2022-09-26T09:46:40Z

Algue-Rythme
Sep 26, 2022

Hi DoTulip

The tool jaxopt.ScipyMinimize is just a wrapper for Scipy - it is equivalent to calling Scipy.minimize on your function directly (same code is running hunder the hood). In particular this code is not jittable, does not benefit from GPU/TPU speed up. The only exception with Scipy is that it is actually possible to differentiate through the wrapper thanks to implicit differentiation.

The jaxopt.LBFGS is a pure re-implementation of L-BFGS in Jax: it is differentiable, run on GPU/TPU, can be wrapped in jax.jit. This should be your preferred tool if performance is an issue (this definitively what you want to use for the train_step function of your neural network).

For neural network I assume you are interested in a stochastic variant of L-BFGS. This optimizer is not a stochastic optimizer so the run_method is not available: you have to iterate over your minibatchs manually using update() method like described here. Note that the default behavior of L-BFGS is a line search (if stepsize=0.) : that might be expensive and fail in the stochastic setting. Don't hesitate to chose a fix value for stepsize instead, and to tune this hyper-parameter.

3 replies

Algue-Rythme Sep 26, 2022

You can also take a look at the deep learning examples.

DoTulip Sep 26, 2022
Author

Thank you for being so helpful! I found that the jnp.float64 format cannot be used when using jaxopt.LBFGS, while jaxopt.ScipyMinimize can be used smoothly. Is this a problem with my usage?

mblondel Sep 26, 2022
Maintainer

I believe we have fixed this issue in 3998f58 and ddd6c30. It will be in the next release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about LBFGS #307

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question about LBFGS #307

Uh oh!

DoTulip Sep 14, 2022

Replies: 1 comment · 3 replies

Uh oh!

Algue-Rythme Sep 26, 2022

Uh oh!

Algue-Rythme Sep 26, 2022

Uh oh!

DoTulip Sep 26, 2022 Author

Uh oh!

mblondel Sep 26, 2022 Maintainer

DoTulip
Sep 14, 2022

Replies: 1 comment 3 replies

Algue-Rythme
Sep 26, 2022

DoTulip Sep 26, 2022
Author

mblondel Sep 26, 2022
Maintainer