Why (the convergence)
After the Redis and DragonflyDB campaigns (docs/bench/OPTIMIZATION_LOG.md), IronCache is at memory parity-or-better and faster per core than every competitor, with one structural gap remaining vs Dragonfly. Two separate "clear win" levers turn out to be the SAME lever:
- (a) A uniform memory win: escape
hashbrown's power-of-two DOUBLING TROUGH. At keycounts where a per-shard table sits just past a doubling (e.g. 1M keys / 2 shards = 500k/shard at about 48% load), roughly 9 of about 19 table bytes/key are wasted empty buckets. Dragonfly's incrementally-split Dashtable never sits in that trough (it is why IronCache ties Dragonfly at 1M but wins at 900k, and the apt-7.x vs 8.x sweep confirmed both engines oscillate with keycount).
- (b) True O(1) eviction: the current pooled evictor is O(N/CAP) amortized because
hashbrown::HashTable exposes no cheap random bucket access, so the refill must scan for the coldest CAP. Dragonfly evicts one slot from the segment it is already touching = O(1), zero per-key state. A fair Dragonfly eviction head-to-head is even blocked at small scale (Dragonfly's 256 MiB/thread boot floor forces a multi-GB dataset, exactly where O(N/CAP) trails O(1)).
A Dragonfly-style DASH (extendible-hashing, segmented) table provides BOTH: random segment/bucket access gives O(1) segment-local eviction, and segment-at-a-time growth eliminates the doubling trough. Per-slot metadata is roughly equal to hashbrown's (both about 1 control / fingerprint byte), so it does not cost memory.
Scope and risk
- The LARGEST remaining structural bet: a core store-table rewrite that replaces the per-shard
hashbrown::HashTable<Entry>, with heavy unsafe (raw segment slots) and a real throughput-regression risk vs hashbrown's SIMD probe.
- Must preserve: the frozen
Store waist + ValueRef/RmwEntry/side-traits, the deterministic scan_hash SCAN cursor (ADR-0003), the freq-in-object eviction (the 2-bit frequency on each object), and the jemalloc accounting.
- Gate hard: the full suite + miri (strict-provenance) + the per-PR perf-gate + the pinned-Linux and Dragonfly head-to-heads. It must NOT regress the current speed/latency wins to gain memory; if it does, it is not worth it.
Smaller follow-up to do first
Bounded-selection refill in evict_to_fit_pooled (collect only the CAP coldest candidates via a bounded heap, dropping the per-refill O(resident) clone-all + sort). A cheap constant-factor eviction win that does NOT need the table rewrite, and it makes large-resident eviction feasible.
Acceptance
A clear, uniform memory win over Dragonfly across keycounts (no doubling trough) AND O(1)-class segment-local eviction, with the speed/latency wins preserved, proven on the pinned-Linux + Dragonfly head-to-heads.
References
docs/bench/OPTIMIZATION_LOG.md ("FULL COMPETITOR MATRIX", the DragonflyDB campaign synthesis, the amortized-eviction round 10).
- The Dragonfly-strategy deep-research (Dashtable density + integrated segment-local eviction; mimalloc ruled out as a non-lever).
Why (the convergence)
After the Redis and DragonflyDB campaigns (
docs/bench/OPTIMIZATION_LOG.md), IronCache is at memory parity-or-better and faster per core than every competitor, with one structural gap remaining vs Dragonfly. Two separate "clear win" levers turn out to be the SAME lever:hashbrown's power-of-two DOUBLING TROUGH. At keycounts where a per-shard table sits just past a doubling (e.g. 1M keys / 2 shards = 500k/shard at about 48% load), roughly 9 of about 19 table bytes/key are wasted empty buckets. Dragonfly's incrementally-split Dashtable never sits in that trough (it is why IronCache ties Dragonfly at 1M but wins at 900k, and the apt-7.x vs 8.x sweep confirmed both engines oscillate with keycount).hashbrown::HashTableexposes no cheap random bucket access, so the refill must scan for the coldest CAP. Dragonfly evicts one slot from the segment it is already touching = O(1), zero per-key state. A fair Dragonfly eviction head-to-head is even blocked at small scale (Dragonfly's 256 MiB/thread boot floor forces a multi-GB dataset, exactly where O(N/CAP) trails O(1)).A Dragonfly-style DASH (extendible-hashing, segmented) table provides BOTH: random segment/bucket access gives O(1) segment-local eviction, and segment-at-a-time growth eliminates the doubling trough. Per-slot metadata is roughly equal to hashbrown's (both about 1 control / fingerprint byte), so it does not cost memory.
Scope and risk
hashbrown::HashTable<Entry>, with heavy unsafe (raw segment slots) and a real throughput-regression risk vs hashbrown's SIMD probe.Storewaist +ValueRef/RmwEntry/side-traits, the deterministicscan_hashSCAN cursor (ADR-0003), the freq-in-object eviction (the 2-bit frequency on each object), and the jemalloc accounting.Smaller follow-up to do first
Bounded-selection refill in
evict_to_fit_pooled(collect only the CAP coldest candidates via a bounded heap, dropping the per-refill O(resident) clone-all + sort). A cheap constant-factor eviction win that does NOT need the table rewrite, and it makes large-resident eviction feasible.Acceptance
A clear, uniform memory win over Dragonfly across keycounts (no doubling trough) AND O(1)-class segment-local eviction, with the speed/latency wins preserved, proven on the pinned-Linux + Dragonfly head-to-heads.
References
docs/bench/OPTIMIZATION_LOG.md("FULL COMPETITOR MATRIX", the DragonflyDB campaign synthesis, the amortized-eviction round 10).