Skip to content

Conversation

@jakelishman
Copy link
Member

Previously we used a B-tree map internally, which is more elegant algorithmically but hard rather worse runtime performance. This replacement "group then sort" implementation uses more straight-forward and cache-friendlier data types. For some benchmarks of non-canonical observables produced by the qiskit-fermions package, this reduced the runtime by a third.

There is clear scope for additional improvement here, such as drop-in replacing the hashbrown hashmap with a sharded concurrent one like dashmap, but since that would be introducing a new dependency, it can be a follow-up.

Summary

Details and comments

cc @mrossinek

Previously we used a B-tree map internally, which is more elegant
algorithmically but hard rather worse runtime performance.  This
replacement "group then sort" implementation uses more straight-forward
and cache-friendlier data types.  For some benchmarks of non-canonical
observables produced by the `qiskit-fermions` package, this reduced the
runtime by a third.

There is clear scope for additional improvement here, such as drop-in
replacing the `hashbrown` hashmap with a sharded concurrent one like
`dashmap`, but since that would be introducing a new dependency, it
can be a follow-up.
@jakelishman jakelishman added this to the 2.4.0 milestone Jan 19, 2026
@jakelishman jakelishman requested a review from a team as a code owner January 19, 2026 12:32
@jakelishman jakelishman added performance Changelog: New Feature Include in the "Added" section of the changelog mod: quantum info Related to the Quantum Info module (States & Operators) labels Jan 19, 2026
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@coveralls
Copy link

Pull Request Test Coverage Report for Build 21137603349

Details

  • 12 of 13 (92.31%) changed or added relevant lines in 1 file are covered.
  • 6 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.01%) to 87.95%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/quantum_info/src/sparse_observable/mod.rs 12 13 92.31%
Files with Coverage Reduction New Missed Lines %
crates/circuit/src/parameter/parameter_expression.rs 1 87.09%
crates/qasm2/src/lex.rs 5 92.29%
Totals Coverage Status
Change from base Build 21112220188: 0.01%
Covered Lines: 100204
Relevant Lines: 113933

💛 - Coveralls

Copy link
Member

@mrossinek mrossinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I did not yet re-run the benchmark with which I was testing due to resource constraints but will try to do so later this week 👍

@jakelishman
Copy link
Member Author

Max: if you're able to comment at all with benchmark results from this from your real-world use-case, it'd be great thanks!

@mrossinek
Copy link
Member

Here are single-run benchmark results comparing the performance of qk_obs_canonicalize from before (50% opaque) and after this PR (100% opaque). The panel in question for this PR is in the bottom left.

We can see a clear improvement (modulo some outliers which I believe have to do with parallel jobs running on the cluster that I used, impacting the measured wall time; I also noticed heavy RAM and SWAP usage, so this may have impacted runtime, too).

nitrogen_ccpvdz nitrogen_ccpvtz

Overall, I'd say this is a great step in the right direction! Although it appears as though the parallelization is not yet yielding the performance one could expect. In this particular case, the total wall time increases with an increasing number of threads, because the SparseObservable that has to be canonicalized will be larger when more threads were used in the previous (the jordan_wigner) step, simply because there will be more duplication of terms. (I should probably be recording the exact size of that data structure, but did not during this run. If desired, I can get that data and update again here.)

@jakelishman
Copy link
Member Author

Thanks for the graphs Max. There's no meaningful parallelisation in this algorithm yet, so that matches my expectations.

The memory use will have increased somewhat in the serial case, but by a factor of like 2x at most, I think. I don't know what Rayon's implementation of par_sort does with respect to allocations - it's possible that that's overallocating.

@jakelishman
Copy link
Member Author

Status update: Max and I did a little more work looking at going further with this, and fairly immediately became apparent that we need tighter exposure of the threading control on a per-function-call basis; Max was struggling with overhead from accidentally double-dispatching parallelism, and overhead from parallel data structures in situations where we logically should only have a single thread available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: New Feature Include in the "Added" section of the changelog mod: quantum info Related to the Quantum Info module (States & Operators) performance

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

4 participants