-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error checking for #1461 #1462
base: main
Are you sure you want to change the base?
Error checking for #1461 #1462
Conversation
Hi @Game4Move78! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
1 similar comment
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
We might add This should solve the issue, however it means that the user might specify "I want llambda to be 20 and Nevergrad decides to set llambda to 30". |
Nevergrad may ignore user specified llambda if fewer than num_workers
Your code looks good to me, the problem might be in MixDeterministicRL. I investigate. Thanks for your work. |
@@ -158,6 +159,8 @@ def _internal_ask_candidate(self) -> p.Parameter: | |||
self.population[candidate.uid] = candidate | |||
self._uid_queue.asked.add(candidate.uid) | |||
return candidate | |||
# stop queue wrapping around to lineage waiting for a tell | |||
assert self._uid_queue.told, "More untold asks than population size (exceeds num_workers)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrapin you are the expert for self._uid_queue.told (among so many things...), do you validate this assert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I guess the error is in class Portfolio. Let me propose a fix (fingers crossed :-) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrapin you are the expert for self._uid_queue.told (among so many things...), do you validate this assert ?
If it helps, my thinking was that there should be a tell preceding every ask after the initalization phase keeping the told queue non-empty. Even in the worst case where popsize ==num_workers and all workers are evaluating untold points, the worker that beats the others to the tell can use the same point again on the next ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just sent a message to Jeremy, who knows that code better than anyone else and who might not have been close to github recently. Sorry for the delay; your PR is interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to be strict with the fact that we should not go beyond num_workers
, but I changed my mind a couple of years ago because there are many cases you don't master all the details of what is happening (eg: a process dies and you'll never get the result), most times the user won't deal with it and we should be robust to it to simplify use. The code was then supposed to be robust but visibly there are corner cases :s
I would be therefore rather make it robust to this case (would that just take removing duplicates in UuidQueue.told ? it should be light speed so not a problem)
cc @bottler you seemed to disagree and want the user to strictly conform to the "contract", maybe we can discuss and adapt depending if I change your mind or not ;)
@Game4Move78 as a power user, would you rather it bugged explicitely, or be robust to those corner cases? (why did you happen to ask for more points?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the hyper-parameter settings of papers that used DE for HPO and set popsize to 20 explicitly without providing num_workers, and thought it would be robust. I then asked for more points and handed them to my own adaptive resource allocation + early stopping implementation that evaluated HPO choices with multiple budgets and only provided a tell to the NG optimiser when points were either stopped early or allocated maximum budget.
This would work fine for hundreds of points until it hit that corner case with a point in the told queue that has been deleted from population. My current workaround is to provide feedback immediately on the minimum budget and then treat all evaluations on higher budgets as unasked points, which works fine for DE.
If you want less strict (I do too), how about we allow duplicates in told but at L162 we add
while lineage not in self.population:
lineage = self._uid_queue.ask()
Which I believe would toss away those points that were deleted from a better tell not asked. Future asks will be biased to duplicate points. Added a commit that checks for duplicate tell using absence from asked queue, although there may be a more intuitive way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference to help users master those details where they can is to copy Ax's client interface with an abandon_tell
. For most optimisers this would just tell a large value, and the BO optimisers might do something different to avoid damaging the model.
so ParaPortfolio is not really parallel.
Avoid adding uid to queue twice. This handles both cases: - More asks than workers (point used twice but added to told queue once) - Ask without a tell (last worker grabs this point from asked queue) facebookresearch#1462 (comment)
Reworded comment
@jrapin Any chance of getting this merged 😃? Line nevergrad/nevergrad/optimization/utils.py Line 340 in 8403d6c
UidQueue.asked to check for presence in told , and self._uid_queue.asked is configured directly on many lines in _DE already.
I believe this code enforces that for asked points with the same parent, the lineage will be added to the told queue only once in subsequent tells:
|
#1461
Types of changes
Motivation and Context / Related issue
#1461
How Has This Been Tested (if it applies)
Checklist