Skip to content

Eliminate random seek-writes to out_spent.bin during index build #68

@arminsabouri

Description

@arminsabouri

During the parse pass, every transaction input triggers a random write to out_spent.bin: the parser seeks to the spending output's position and writes 8 bytes (TxInId). With ~2.5B inputs on mainnet, this means ~2.5B individual seek() + write_all() calls. This is the only index that doesn't build sequentially.

Relevant code: src/crates/primitives/src/parser.rs around visit_tx_in, and OutSpentByIndex::set() in src/crates/primitives/src/indecies.rs.
Decouple the out_spent.bin write from the parse pass:

  1. During parsing, instead of calling out_spent_index.set() on each input, accumulate (TxOutId, TxInId) pairs into an in-memory buffer.
  2. After the parse pass completes, sort the buffer by TxOutId.
  3. Write out_spent.bin in a single sequential pass, merging the sorted pairs with the already-sequential output range.

This can be done in chunks: flush and sort periodically, then do a final merge-sort pass over the sorted chunks before writing the file.

  • out_spent.bin is written in one sequential pass instead of random seeks.
  • Build time for full mainnet sync should decrease meaningfully, particularly on HDD or NVMe under write pressure.
  • No change to the on-disk format or read path
    AC:
  • OutSpentByIndex::set() is no longer called during the parse visitor loop.
  • out_spent.bin contents are identical to the current implementation (verified by existing round-trip tests).
  • Build time on a representative dataset is measured before and after.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions