Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to accelerate LBFGS #13

Open
muammar opened this issue Apr 6, 2019 · 1 comment
Open

Trying to accelerate LBFGS #13

muammar opened this issue Apr 6, 2019 · 1 comment

Comments

@muammar
Copy link

muammar commented Apr 6, 2019

Thanks for this implementation. Recently, I've been working on a package to use machine learning for chemistry problems where I use pytorch to train some models. I have been able to perform distributed training using a library called dask that accelerated the training phase.

When I use first-order optimization algorithms such as Adam, I can get up to 3 optimization steps per second (but those algorithms converge slowly compared to second-order ones). When using LBFGS I just get 1 optimization step each 7 seconds for the same number of parameters. I am interested in using a dask client to make some parts of your LBFGS implementation to work in a distributed manner so that each optimization step is faster. I started reading the code, and have a very brief idea of the LBFGS algorithm. However, I wondered if you could give me some hints about the parts of the module that could be independently computed and therefore distributed?

I would appreciate your thoughts on this.

@hjmshi
Copy link
Owner

hjmshi commented Apr 9, 2019

Thanks for your question! I'm not too familiar with dask and am not sure about your problem setting. Can you clarify what problem you are looking at? Is it a finite-sum problem?

Typically, SGD/Adam are distributed in a data-parallel fashion, where only the function/gradient over a subset of the dataset is evaluated over each node, then aggregated for computation. Something similar can be done for L-BFGS, although there are various possible approaches for dealing with the two-loop recursion, line search, etc. However, this approach makes sense only if function/gradient evaluations are the primary bottleneck in computation (as it is in deep learning). If you have some additional details about your problem, I may be able to give some better ideas for distributing the algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants