Skip to content

Conversation

@rocallahan
Copy link
Contributor

@rocallahan rocallahan commented Oct 31, 2025

If your work is part of a larger effort, please discuss your general plans on Discourse first to align your vision with maintainers.

See https://yosyshq.discourse.group/t/parallel-optmergepass-implementation/87/10, although I've changed the approach somewhat. #5415 is also relevant, albeit superseded by this PR.

What are the reasons/motivation for this change?

We want to make SigSpec usable in multithread contexts. This requires making all const methods actually read-only.

Explain how this is achieved.

The main idea here is to change the representation of SigSpec from "vector of SigChunk or vector of SigBit" to "single inlined SigChunk or vector of SigBit". That change makes it quite easy to implement SigSpec::operator[] const efficiently in a read-only manner.

Making that change requires updating all code that directly accesses the chunks_ vector to use a "chunks iterator" which can work with either representation.

The new representation works well in practice. I did some analysis of the use of SigSpecs and observed that most SigSpecs can be representated as a single SigChunk (partly because most SigSpecs are actually a single bit). I actually see a nice speedup from this PR:

main: (c9a4c608cef74e0ee4c3dab17288396d8e1afcbf)
Benchmark 1: ./yosys -p "read_verilog -sv -I/usr/local/google/home/rocallahan/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth"
  Time (mean ± σ):     21.297 s ±  0.263 s    [User: 20.057 s, System: 1.264 s]
  Range (min … max):   20.828 s … 21.692 s    10 runs

This PR:
Benchmark 1: ./yosys -p "read_verilog -sv -I/usr/local/google/home/rocallahan/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth"
  Time (mean ± σ):     17.412 s ±  0.099 s    [User: 16.342 s, System: 1.082 s]
  Range (min … max):   17.293 s … 17.652 s    10 runs

The speedup probably comes partly from avoiding heap allocation in the single-SigChunk case, and also from the fact that we avoid repeated representation flipping. In the new code, SigSpecs (if not empty) start off with the SigChunk representation, and we keep them in that representation as long as possible. If at some point we need to unpack to the bits representation, we do that and then it uses the bits representation forever.

Of course it is still possible to construct testcases where this PR would regress performance --- for example if you have a lot of SigSpecs which can be compactly represented as two very wide SigChunks. I hope that kind of thing is not a problem in practice but you never know... If you have any testcases you want me to investigate I'd be happy to do that.

You can see from the commits that I was able to maintain source compatibility with the rest of the in-tree code in almost all cases. The main issue that worries me is that a few callers to SigSpec::chunks() assume it returns the same underlying object in all cases so e.g. an iterator returned by sig.chunks().begin() can always be compared to sig.chunks().end().

I'm sorry that this PR is quite large, but to avoid regressing performance we need to merge a lot of work at once. The final commit fixes the thread-safety of hash_ and is probably performance-neutral, so could be broken out if desired.

…equality, just use the width and chunkwise comparisons

This avoids having to pack and compute hashes, and generally results in a
simpler ordering.
… that's inline in the SigSpec.

Single-chunk SigSpecs are very common and this avoids a heap allocation. It also simplifies
some algorithms.
@widlarizer widlarizer self-assigned this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants