Releases: bitfaster/BitFaster.Caching
Releases · bitfaster/BitFaster.Caching
v2.2.1
What's changed
- Fix a
ConcurrentLru
bug where a repeated pattern of sequential key access could lead to unbounded growth. - Use Span APIs within
MpscBoundedBuffer
/StripedMpscBuffer
/ConcurrentLfu
on .NET6/.NETCore3.1 build targets. ReducesConcurrentLfu
lookup latency by about 5-7% in the lookup benchmark.
Full changelog: v2.2.0...v2.2.1
v2.2.0
What's changed
- Provide a new overload for
ICache.GetOrAdd
enabling the value factory delegate to accept an input argument.
TValue GetOrAdd<TArg> (TKey key, Func<TKey,TArg,TValue> valueFactory, TArg factoryArgument)
If additional data is required to construct/fetch cached values, this provides a mechanism to pass data into the factory without allocating a new closure on the heap. Passing aCancellationToken
into to an async value factory delegate is a common use case. - Implement equivalent factory arg functionality for
IAsyncCache
,IScopedCache
andIAsyncScopedCache
. - To support different factory signatures without downstream code duplication, provide
IValueFactory
andIAsyncValueFactory
value types. - Implement build time package validation to prevent breaking changes going forward. Fixed all breaking changes introduced since v2.0.0. The v2.2.0 NET6 and NET3.1 build targets are fully compatible with v2.0.0 and v2.1.0 without recompilation. Intermediate point updates since v2.1.0 may require recompilation. The NET Standard 2.0 target is fully compatible with v2.0.0, but the updated event and metric are no longer included since they break compatibility.
Full changelog: v2.1.3...v2.2.0
v2.1.3
What's changed
- Fix bug preventing ConcurrentTLru from expiring items if the host machine runs for longer than 49 days (on .NET Core 3/.NET6 only). This was a regression introduced in v2.1.2.
- TLRU TimeToLive is now validated for each policy implementation. This is a behavior change, invalid TTL values now throw ArgumentOutOfRangeException rather than silently setting an incorrect and/or negative TTL.
Full changelog: v2.1.2...v2.1.3
v2.1.2
What's changed
- Added an ItemUpdated event for all LRU classes, including the scoped and atomic cache decorators.
- ConcurrentTLru/FastConcurrentTLru now use a clock based on Environment.TickCount64 for .NET Core 3 and .NET6 build targets, instead of Stopwatch.GetTimestamp. The smallest reliable time to live increases from about 1us to about 16ms (so precision is now worse), but the overhead of the TLRU policy drops significantly from about 170% to 20%. This seems like a good tradeoff, since expiring items faster than 16ms is not common. .NET Standard continues to use the previous high resolution clock since TickCount is 32 bit only on .NET Framework.
- On .NET Core 3 and .NET6 LruBuilder will automatically fall back to the previous higher resolution clock if the specified TTL is less than 32ms.
- Fixed Atomic cache count and enumeration methods such that partially created items are not visible externally. Count, enumerate and TryGet methods now all return consistent results if a factory delegate throws during item creation.
- Fixed Atomic cache debug view, all caches now have a consistent debugger experience.
Full changelog: v2.1.1...v2.1.2
v2.1.1
What's changed
- Update
CmSketch
to use block-based indexing, matching Caffeine. The 64-byte blocks are the same size as x86 cache lines. This scheme exploits the hardware by reducing L1 cache misses, since each increment or frequency call is guaranteed to use data from the same cache line. - Vectorize the hot methods in
CmSketch
using AVX2 intrinsics. When combined with block indexing, this is 2x faster than the original implementation in benchmarks and gives 20% betterConcurrentLfu
throughput when tested end to end. ConcurrentLfu
uses a Running value cache when comparing frequency. In the best case this reduces the number of sketch frequency calls by 50%. Improves throughput.- Unrolled the loop in
CmSketch.Reset
, reduces reset execution time by about 40%. This is called periodically so reduces worst case rather than averageConcurrentLfu
maintenance time. - Implement a ThrowHelper invoked from all exception call sites. Reduces the size of the generated asm. Eliminated an unnecessary throw from the
ConcurrentLfu
hot path, minor latency reduction when benchmarked. - Increase
ConcurrentLru
cycle count when evicting items. Prevents runaway growth when stress tested on AMD CPUs. ConcurrentLfu
disposes items created but not cached when races occur duringGetOrAdd
.
Full changelog: v2.1.0...v2.1.1
v2.1.0
What's Changed
- Added
ConcurrentLfu
, a .NET implementation of the W-TinyLfu admission policy. This closely follows the approach taken by the Caffeine library by Ben Manes - including buffered reads/writes and hill climbing to optimize hit rate. AConcurrentLfuBuilder
provides integration with the existing atomic value factory and scoped value features. - To support
ConcurrentLfu
added theMpscBoundedBuffer
andStripedMpscBuffer
classes. - To support
ConcurrentLfu
added theThreadPoolScheduler
,BackgroundThreadScheduler
andForegroundScheduler
classes. - Added the
Counter
class for fast concurrent counting, based on LongAdder by Doug Lea. - Updated
ConcurrentLru
to useCounter
for all metrics and added padding to internal queue counters. This improved throughput by about 2.5x with about 10% worse latency. - Added DebuggerTypeProxy types to customize the debugger view of
FastConcurrentLru
,ConcurrentLru
,FastConcurrentTLru
,ConcurrentTLru
andConcurrentLfu
. - API documentation is now included in the NuGet package. Provided documentation for all public APIs.
- Rewrote and corrected bugs in the throughput analysis tests, which now support Read, Read + Write, Update and Evict scenarios.
Full changelog: v2.0.0...v2.1.0
v2.0.0
What's Changed
- Split
ICache
intoICache
,IAsyncCache
,IScopedCache
andIScopedAsyncCache
interfaces. Mixing sync and async code paths is problematic and generally discouraged. Splitting sync/async enables the most optimized code for each case. Scoped caches returnLifetime<T>
instead of values, and internally have all the boilerplate code to safely resolve races. - Added
ConcurrentLruBuilder
, providing a fluent builder API to ease creation of different cache configurations. Each cache option comes with a small performance overhead. The builder enables the developer to choose the exact combination of options needed, without any penalty from unused features. - Cache interfaces have optional metrics, events and policy objects depending on the options chosen when constructing the cache.
- Implemented optional support for atomic GetOrAdd methods (configurable via ConcurrentLruBuilder), mitigating cache stampede.
- ConcurrentLru now has configurable hot, warm and cold queue size via
ICapacityPartition
. Default partition scheme changed from equal to 80% warm viaFavorWarmPartition
, improving hit rate across all tests. - Fixed ConcurrentLru warmup, allowing items to enter the warm queue until warm is full. Improves hit rate across all tests.
- Added hit rate analysis tests for real world traces from Wikibench, ARC and glimpse workloads. This replicates the test suite used for Java's Caffeine.
- Async get methods now return
ValueTask
, reducing memory allocations. - Added eviction count to cache metrics
Full changelog: v1.1.0...v2.0.0
v1.1.0
What's Changed
- Added
Trim(int itemCount)
toICache
and all derived classes - Added
TrimExpired()
to TLRU classes - When an item is updated TLRU classes reset the item timestamp, extending TTL
- Fixed TLRU long ticks policy on macos
- Add
Cleared
andTrimmed
toItemRemovedReason
- Item removal event that fires on
Clear()
now hasItemRemovedReason.Cleared
instead ofEvicted
Full changelog: v1.0.7...v1.1.0
v1.0.7
Added diagnostic features to dump cache contents:
- ClassicLru/ConcurrentLru family implements IEnumerable<KeyValuePair<K,V>>. Enables enumeration of keys and values in the cache.
- ClassicLru/ConcurrentLru family implements ICollection Keys. Enables enumeration of the keys in the cache.
v1.0.6
- Implement ItemRemoved event on ConcurrentLru and ConcurrentTLru
- NuGet package now produced with deterministic build. See https://reproducible-builds.org/ for info