Release v2.1.1 · bitfaster/BitFaster.Caching

What's changed

Update CmSketch to use block-based indexing, matching Caffeine. The 64-byte blocks are the same size as x86 cache lines. This scheme exploits the hardware by reducing L1 cache misses, since each increment or frequency call is guaranteed to use data from the same cache line.
Vectorize the hot methods in CmSketch using AVX2 intrinsics. When combined with block indexing, this is 2x faster than the original implementation in benchmarks and gives 20% better ConcurrentLfu throughput when tested end to end.
ConcurrentLfu uses a Running value cache when comparing frequency. In the best case this reduces the number of sketch frequency calls by 50%. Improves throughput.
Unrolled the loop in CmSketch.Reset, reduces reset execution time by about 40%. This is called periodically so reduces worst case rather than average ConcurrentLfu maintenance time.
Implement a ThrowHelper invoked from all exception call sites. Reduces the size of the generated asm. Eliminated an unnecessary throw from the ConcurrentLfu hot path, minor latency reduction when benchmarked.
Increase ConcurrentLru cycle count when evicting items. Prevents runaway growth when stress tested on AMD CPUs.
ConcurrentLfu disposes items created but not cached when races occur during GetOrAdd.

Full changelog: v2.1.0...v2.1.1