Skip to content

Implementation of (1 + lambda) #288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
siddn opened this issue Feb 3, 2025 · 5 comments
Open

Implementation of (1 + lambda) #288

siddn opened this issue Feb 3, 2025 · 5 comments

Comments

@siddn
Copy link

siddn commented Feb 3, 2025

Does the current code base provide an implementation of (1 + lambda) or (1 + 1) CMA-ES? I know the recommended implementation is the standard (mu, lambda), however I would like to evaluate our problem with the other two methods as we have a higher tolerance for unideal local optima. In the current CMAOptions(), I see there are settings for parent number, and elitism but I'm unsure if this will utilize the step-size described in this paper Section 2.1 Algorithm 1, Procedure updateStepSize. Any guidance on this would be appreciated!

@nikohansen
Copy link
Contributor

nikohansen commented Feb 5, 2025

Does the current code base provide an implementation of (1 + lambda) or (1 + 1) CMA-ES?

No, it does not. Indeed, the closest functionality provided would be using the option {'CMA_elitist': True}. This does not change the step-size adaptation mechanism, but I am also not really convinced that it "should". It is, however, a not thoroughly benchmarked code feature.

as we have a higher tolerance for unideal local optima

I don't quite understand what this means or why this implies to better use (1+1)-selection. My first approach to finding a local optimum quickly would be to set a relatively small initial step-size, the third argument of cma.fmin2.

@siddn
Copy link
Author

siddn commented Feb 5, 2025

Thanks for the information.

As to the second part, I can elaborate a little on our specific use case. Our problem involves testing several interactions between a user and a robotic device, and we want to find the best device parameters for a given user as defined by our objective function. Some key points

  • Our problem is 3 dimensional, so relative to many other test problems our search space is rather small.
  • The "function evaluations" are extremely expensive, i.e. it can take as much as several minutes evaluate each candidate solution. To this end, we are more interested in finding a "good enough" solution with minimal wall time spent optimizing than a global solution. Additionally, we want the user to be spending as much time in a "close enough" region as possible, without sampling too far away from our expected optima. (in this case, I think your advice on starting with a small initial step-size seems to be the best place to start)
  • We also think our users will adapt to the device over time, giving us some dynamic landscape for our objective function. This is primarily why we want to use CMA-ES over another optimizer.
  • I've implemented the 1+lambda CMA-ES as best I can to test it on some offline problems, and it seems that it actually ends up being worse for dynamic problems, as the elitist solution (if we preserve its old function value and don't re-evaluate) tends to dominate and stagnate the search. Does this match our intuition given the design of the 1+lambda implementation?
  • If the above is true, are there general recommendations for how the parameters for CMA-ES should be selected for problems with dynamic landscapes, and more interest in tracking a local minima than hunting for a global minima?
    Thank you again for the replies 😄

@nikohansen
Copy link
Contributor

The "function evaluations" are extremely expensive, i.e. it can take as much as several minutes evaluate each candidate solution. To this end, we are more interested in finding a "good enough" solution with minimal wall time spent optimizing than a global solution. Additionally, we want the user to be spending as much time in a "close enough" region as possible, without sampling too far away from our expected optima. (in this case, I think your advice on starting with a small initial step-size seems to be the best place to start)

It's always a good idea to use cma.plot to display the tracked state variables. It will in particular give a good indication whether the initial step-size was chosen well.

We also think our users will adapt to the device over time, giving us some dynamic landscape for our objective function. This is primarily why we want to use CMA-ES over another optimizer.

This looks like one of the worse case scenarios for having elitism without reevaluations.

I've implemented the 1+lambda CMA-ES as best I can to test it on some offline problems, and it seems that it actually ends up being worse for dynamic problems, as the elitist solution (if we preserve its old function value and don't re-evaluate) tends to dominate and stagnate the search. Does this match our intuition given the design of the 1+lambda implementation?

I'd say so.

If the above is true, are there general recommendations for how the parameters for CMA-ES should be selected for problems with dynamic landscapes, and more interest in tracking a local minima than hunting for a global minima?

The main "issue" would be that the step-size becomes too small, when either the landscape is not moving for some time or when it is moving too fast (which can then impose as "noise") where it should increase. Using the implemented noise handling should at least address the second scenario.

If you have control over the dynamic changes, it seems also highly advisable to do changes only between the CMA-ES iterations and not within.

@siddn
Copy link
Author

siddn commented Feb 21, 2025

Thank you for the responses, this has been very helpful. I did have a few more general questions if you'd be willing. In general, I understand CMA-ES is quite good (and generically consistent) at larger dimensional problems, and is more tolerant of potentially dynamic landscapes than a method like bayesian optimization. However, are there recommendations you may have for a system with a low dimensional search space (<10) and few function evaluations (~100-150), primarily that are still able to have CMA-ES's higher tolerance for dynamic landscapes. Thanks!

@nikohansen
Copy link
Contributor

Generally, some tweaked versions of Nelder Mead are very good in low dimension, in particular below five, and SLSQP (available in scipy.optimize) is very good with low budgets. How they hold up in a dynamic landscape is hard to say, Nelder Mead probably better than SLSQP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants