-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Change SparseObservable.canonicalize to HashMap/Vec internals
#15582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Change SparseObservable.canonicalize to HashMap/Vec internals
#15582
Conversation
Previously we used a B-tree map internally, which is more elegant algorithmically but hard rather worse runtime performance. This replacement "group then sort" implementation uses more straight-forward and cache-friendlier data types. For some benchmarks of non-canonical observables produced by the `qiskit-fermions` package, this reduced the runtime by a third. There is clear scope for additional improvement here, such as drop-in replacing the `hashbrown` hashmap with a sharded concurrent one like `dashmap`, but since that would be introducing a new dependency, it can be a follow-up.
|
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 21137603349Details
💛 - Coveralls |
mrossinek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I did not yet re-run the benchmark with which I was testing due to resource constraints but will try to do so later this week 👍
|
Max: if you're able to comment at all with benchmark results from this from your real-world use-case, it'd be great thanks! |
|
Thanks for the graphs Max. There's no meaningful parallelisation in this algorithm yet, so that matches my expectations. The memory use will have increased somewhat in the serial case, but by a factor of like 2x at most, I think. I don't know what Rayon's implementation of |
|
Status update: Max and I did a little more work looking at going further with this, and fairly immediately became apparent that we need tighter exposure of the threading control on a per-function-call basis; Max was struggling with overhead from accidentally double-dispatching parallelism, and overhead from parallel data structures in situations where we logically should only have a single thread available. |


Previously we used a B-tree map internally, which is more elegant algorithmically but hard rather worse runtime performance. This replacement "group then sort" implementation uses more straight-forward and cache-friendlier data types. For some benchmarks of non-canonical observables produced by the
qiskit-fermionspackage, this reduced the runtime by a third.There is clear scope for additional improvement here, such as drop-in replacing the
hashbrownhashmap with a sharded concurrent one likedashmap, but since that would be introducing a new dependency, it can be a follow-up.Summary
Details and comments
cc @mrossinek