-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for write_cfg_data
#4569
Conversation
Before I forget, I think that it would be a good idea to have the set of pending nodes made explicit and maintained as they are modified - this should introduce a major speed-up since |
This looks like a good improvement, but I think that we might need to assess on engagement code. I am testing it with intermittent writing to disk (every |
@PetarMax Thanks, that is a good suggestion for how we can solve the performance issues with the
Edit: I think I misunderstood. The performance gain on the engagement code from this PR that I tested was also greater than what I wrote here about the test I ran from kontrol's test suite, although I don't know the exact reason why. Maybe it's just because the individual node JSONs are larger for lido proofs. But I will still test and see if these two techniques stack. |
This looks good to me. @ehildenb, @tothtamas28? |
Makes progress in fixing the slowdown as proofs get large.
I'll call the "sync time" the time in between receiving a step result and starting to wait for the next step result on the master thread.
As proofs get more and more nodes, calling
to_dict()
on every node gets progressively harder and this was taking up a substantial portion of the sync time as proofs got to 500+ nodes.Instead of generating the entire KCFG dict with
KCFG.to_dict()
before passing this toKCFGStore.write_cfg_data()
, which discards all of this work anyway for the final product and replaces it with just a list of the node IDs, it is significantly faster to just pass it the list of the node IDs and have it use the KCFG object, which already has a dictionary storing nodes by ID to find the vacuous and stuck nodes, and to get the newly created nodes. This way we can callto_dict()
on each node only when it is first created.To give an idea of a benchmark, I used a
LoopsTest.test_sum_1000()
(linear and long-running) with--max-depth 1
and--max-iterations 1000
. Before this change it gets to 1000 iterations in 59:06, and after this change it does it in 40:31. Before the change the sync time as this proof approached 1000 nodes ranged between about 3.4 to 4.2 seconds. After the change ranged from about 1.39 to 1.54 seconds.The big remaining chunk of sync time when the proof gets large seems to be in
get_steps()
.