Defer hash checking and rehashing off-thread #1190

mkeeter · 2024-03-06T19:57:12Z

Right now, a significant amount of work in the Crucible Downstairs is simply hashing data:

We hash incoming data to confirm that it's valid
We compute a slightly different hash which is written to disk (see Consider not using on_disk_hash #1161)

Together, this take up about 17% of our flamegraph, and are both in the do_work task (dw_task):

This PR moves that hash checking off-thread into the rayon thread pool, which is common for this kind of blocking work:

    ┌──────────┐              ┌───────────┐
    │FramedRead│              │FramedWrite│
    └────┬─────┘              └─────▲─────┘
         │                          │
         │         ┌────────────────┴────────────────────┐
         │         │         framed_write_task           │
         │         └─▲─────▲──────────────────▲──────────┘
         │           │     │                  │
         │       ping│     │invalid           │
         │  ┌────────┘     │frame             │responses
         │  │              │errors            │
         │  │              │                  │
    ┌────▼──┴─┐ message   ┌┴──────┐  job     ┌┴────────┐
    │resp_loop├──────────►│pf_task├─────────►│ dw_task │
    └──┬───▲──┘ channel   └──┬────┘ channel  └▲────────┘
       │   │                 │                │
  defer│   │oneshot          │                │
      ┌▼───┴┐                │                │
      │rayon│             add│work         new│work
      └─────┘                │                │
                             │                │
    per-connection           │                │
   ========================= │ ============== │ ===============
    shared state          ┌──▼────────────────┴────────────┐
                          │           Downstairs           │
                          └────────────────────────────────┘

The strategy is very similar to #1066 and #1089 , and uses the same DeferredQueue data structure (moved to crucible_common):

When messages arrive, they are pushed onto the deferred queue if (1) they are writes or (2) there are other messages in the deferred queue. This ensures that messages always arrive in order, while not deferring unnecessarily
Deferred messages arrive (in order) back at resp_loop with all of their write metadata (e.g. hashes) precomputed. We then use those hashes in region_write_pre, instead of computing them in Region::region_write and ExtentInner::write

This is a roughly 10% speedup for large writes:

1M WRITE: bw=809MiB/s (848MB/s), 809MiB/s-809MiB/s (848MB/s-848MB/s), io=47.5GiB (51.0GB), run=60129-60129msec
4K WRITE: bw=22.8MiB/s (23.9MB/s), 22.8MiB/s-22.8MiB/s (23.9MB/s-23.9MB/s), io=1367MiB (1433MB), run=60010-60010msec
1M WRITE: bw=808MiB/s (847MB/s), 808MiB/s-808MiB/s (847MB/s-847MB/s), io=47.4GiB (50.9GB), run=60046-60046msec
4M WRITE: bw=816MiB/s (855MB/s), 816MiB/s-816MiB/s (855MB/s-855MB/s), io=47.9GiB (51.4GB), run=60098-60098msec

Previously, I was seeing numbers in the 700-ish range, e.g.

1M WRITE: bw=725MiB/s (760MB/s), 725MiB/s-725MiB/s (760MB/s-760MB/s), io=42.6GiB (45.7GB), run=60134-60134msec
4K WRITE: bw=20.1MiB/s (21.1MB/s), 20.1MiB/s-20.1MiB/s (21.1MB/s-21.1MB/s), io=1208MiB (1266MB), run=60027-60027msec
1M WRITE: bw=720MiB/s (755MB/s), 720MiB/s-720MiB/s (755MB/s-755MB/s), io=42.2GiB (45.3GB), run=60052-60052msec
4M WRITE: bw=716MiB/s (751MB/s), 716MiB/s-716MiB/s (751MB/s-751MB/s), io=42.0GiB (45.1GB), run=60099-60099msec

mkeeter added 7 commits March 6, 2024 12:36

Move DeferredQueue to crucible-common

9740be7

Begin working on deferring hash checking

5351cfe

Start plumbing for DeferredMessage

d4c11d7

Do (empty) computation in the rayon thread pool

41ca027

Validate incoming hashes off-thread

2f36619

Precompute DownstairsBlockContext as well

f4c5dfa

Minor tweaks

27c060b

mkeeter force-pushed the deferred-hashes branch from da3332b to 27c060b Compare March 6, 2024 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Defer hash checking and rehashing off-thread #1190

Defer hash checking and rehashing off-thread #1190

Uh oh!

mkeeter commented Mar 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Defer hash checking and rehashing off-thread #1190

Are you sure you want to change the base?

Defer hash checking and rehashing off-thread #1190

Uh oh!

Conversation

mkeeter commented Mar 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant