v2.1.1
What's changed
- Update
CmSketch
to use block-based indexing, matching Caffeine. The 64-byte blocks are the same size as x86 cache lines. This scheme exploits the hardware by reducing L1 cache misses, since each increment or frequency call is guaranteed to use data from the same cache line. - Vectorize the hot methods in
CmSketch
using AVX2 intrinsics. When combined with block indexing, this is 2x faster than the original implementation in benchmarks and gives 20% betterConcurrentLfu
throughput when tested end to end. ConcurrentLfu
uses a Running value cache when comparing frequency. In the best case this reduces the number of sketch frequency calls by 50%. Improves throughput.- Unrolled the loop in
CmSketch.Reset
, reduces reset execution time by about 40%. This is called periodically so reduces worst case rather than averageConcurrentLfu
maintenance time. - Implement a ThrowHelper invoked from all exception call sites. Reduces the size of the generated asm. Eliminated an unnecessary throw from the
ConcurrentLfu
hot path, minor latency reduction when benchmarked. - Increase
ConcurrentLru
cycle count when evicting items. Prevents runaway growth when stress tested on AMD CPUs. ConcurrentLfu
disposes items created but not cached when races occur duringGetOrAdd
.
Full changelog: v2.1.0...v2.1.1