Releases · bitfaster/BitFaster.Caching

06 Oct 01:22

bitfaster

v2.3.0

4330c16

v2.3.0

What's changed

Align TryRemove overloads with ConcurrentDictionary for ICache (including WithAtomicGetOrAdd). This adds two new overloads:
- bool TryRemove(K key, out V value) - enables getting the value that was removed.
- bool TryRemove(KeyValuePair<K, V> item) - enables removing an item only when the key and value are the same.
Fix ConcurrentLfu.Clear() to remove all values when using BackgroundThreadScheduler. Previously values may be left behind after clear was called due to removed items present in window/protected/probation polluting the list of candidates to remove.
Fix ConcurrentLru.Clear() to reset the isWarm flag. Now cache warmup behaves the same for a new instance of ConcurrentLru vs an existing instance that was full then cleared. Previously ConcurrentLru could have reduced capacity during warmup after calling clear, depending on the access pattern.
Add extension methods to make it more convenient to use AtomicFactory with a plain ConcurrentDictionary. This is similar to storing a Lazy<T> instead of T, but with the same exception propagation semantics and API as ConcurrentDictionary.GetOrAdd.

Full changelog: v2.2.1...v2.3.0

Assets 5

22 Aug 02:00

bitfaster

v2.2.1

929b2cf

v2.2.1

What's changed

Fix a ConcurrentLru bug where a repeated pattern of sequential key access could lead to unbounded growth.
Use Span APIs within MpscBoundedBuffer/StripedMpscBuffer/ConcurrentLfu on .NET6/.NETCore3.1 build targets. Reduces ConcurrentLfu lookup latency by about 5-7% in the lookup benchmark.

Full changelog: v2.2.0...v2.2.1

Assets 5

29 Apr 22:42

bitfaster

v2.2.0

73c4e1b

v2.2.0

What's changed

Provide a new overload for ICache.GetOrAdd enabling the value factory delegate to accept an input argument.

TValue GetOrAdd<TArg> (TKey key, Func<TKey,TArg,TValue> valueFactory, TArg factoryArgument)

If additional data is required to construct/fetch cached values, this provides a mechanism to pass data into the factory without allocating a new closure on the heap. Passing a CancellationToken into to an async value factory delegate is a common use case.
Implement equivalent factory arg functionality for IAsyncCache, IScopedCache and IAsyncScopedCache.
To support different factory signatures without downstream code duplication, provide IValueFactory and IAsyncValueFactory value types.
Implement build time package validation to prevent breaking changes going forward. Fixed all breaking changes introduced since v2.0.0. The v2.2.0 NET6 and NET3.1 build targets are fully compatible with v2.0.0 and v2.1.0 without recompilation. Intermediate point updates since v2.1.0 may require recompilation. The NET Standard 2.0 target is fully compatible with v2.0.0, but the updated event and metric are no longer included since they break compatibility.

Full changelog: v2.1.3...v2.2.0

Assets 5

17 Mar 01:05

bitfaster

v2.1.3

8b62b7b

v2.1.3

What's changed

Fix bug preventing ConcurrentTLru from expiring items if the host machine runs for longer than 49 days (on .NET Core 3/.NET6 only). This was a regression introduced in v2.1.2.
TLRU TimeToLive is now validated for each policy implementation. This is a behavior change, invalid TTL values now throw ArgumentOutOfRangeException rather than silently setting an incorrect and/or negative TTL.

Full changelog: v2.1.2...v2.1.3

Assets 4

11 Mar 01:32

bitfaster

v2.1.2

96c5345

v2.1.2

What's changed

Added an ItemUpdated event for all LRU classes, including the scoped and atomic cache decorators.
ConcurrentTLru/FastConcurrentTLru now use a clock based on Environment.TickCount64 for .NET Core 3 and .NET6 build targets, instead of Stopwatch.GetTimestamp. The smallest reliable time to live increases from about 1us to about 16ms (so precision is now worse), but the overhead of the TLRU policy drops significantly from about 170% to 20%. This seems like a good tradeoff, since expiring items faster than 16ms is not common. .NET Standard continues to use the previous high resolution clock since TickCount is 32 bit only on .NET Framework.
On .NET Core 3 and .NET6 LruBuilder will automatically fall back to the previous higher resolution clock if the specified TTL is less than 32ms.
Fixed Atomic cache count and enumeration methods such that partially created items are not visible externally. Count, enumerate and TryGet methods now all return consistent results if a factory delegate throws during item creation.
Fixed Atomic cache debug view, all caches now have a consistent debugger experience.

Full changelog: v2.1.1...v2.1.2

Assets 5

14 Oct 04:21

bitfaster

v2.1.1

d0236c6

v2.1.1

What's changed

Update CmSketch to use block-based indexing, matching Caffeine. The 64-byte blocks are the same size as x86 cache lines. This scheme exploits the hardware by reducing L1 cache misses, since each increment or frequency call is guaranteed to use data from the same cache line.
Vectorize the hot methods in CmSketch using AVX2 intrinsics. When combined with block indexing, this is 2x faster than the original implementation in benchmarks and gives 20% better ConcurrentLfu throughput when tested end to end.
ConcurrentLfu uses a Running value cache when comparing frequency. In the best case this reduces the number of sketch frequency calls by 50%. Improves throughput.
Unrolled the loop in CmSketch.Reset, reduces reset execution time by about 40%. This is called periodically so reduces worst case rather than average ConcurrentLfu maintenance time.
Implement a ThrowHelper invoked from all exception call sites. Reduces the size of the generated asm. Eliminated an unnecessary throw from the ConcurrentLfu hot path, minor latency reduction when benchmarked.
Increase ConcurrentLru cycle count when evicting items. Prevents runaway growth when stress tested on AMD CPUs.
ConcurrentLfu disposes items created but not cached when races occur during GetOrAdd.

Full changelog: v2.1.0...v2.1.1

Assets 5

11 Sep 02:29

bitfaster

v2.1.0

f4cfa6b

v2.1.0

What's Changed

Added ConcurrentLfu, a .NET implementation of the W-TinyLfu admission policy. This closely follows the approach taken by the Caffeine library by Ben Manes - including buffered reads/writes and hill climbing to optimize hit rate. A ConcurrentLfuBuilder provides integration with the existing atomic value factory and scoped value features.
To support ConcurrentLfu added the MpscBoundedBuffer and StripedMpscBuffer classes.
To support ConcurrentLfu added the ThreadPoolScheduler, BackgroundThreadScheduler and ForegroundScheduler classes.
Added the Counter class for fast concurrent counting, based on LongAdder by Doug Lea.
Updated ConcurrentLru to use Counter for all metrics and added padding to internal queue counters. This improved throughput by about 2.5x with about 10% worse latency.
Added DebuggerTypeProxy types to customize the debugger view of FastConcurrentLru, ConcurrentLru, FastConcurrentTLru, ConcurrentTLru and ConcurrentLfu.
API documentation is now included in the NuGet package. Provided documentation for all public APIs.
Rewrote and corrected bugs in the throughput analysis tests, which now support Read, Read + Write, Update and Evict scenarios.

Full changelog: v2.0.0...v2.1.0

Assets 5

29 Jul 01:35

bitfaster

v2.0.0

90318ed

v2.0.0

What's Changed

Split ICache into ICache, IAsyncCache, IScopedCache and IScopedAsyncCache interfaces. Mixing sync and async code paths is problematic and generally discouraged. Splitting sync/async enables the most optimized code for each case. Scoped caches return Lifetime<T> instead of values, and internally have all the boilerplate code to safely resolve races.
Added ConcurrentLruBuilder, providing a fluent builder API to ease creation of different cache configurations. Each cache option comes with a small performance overhead. The builder enables the developer to choose the exact combination of options needed, without any penalty from unused features.
Cache interfaces have optional metrics, events and policy objects depending on the options chosen when constructing the cache.
Implemented optional support for atomic GetOrAdd methods (configurable via ConcurrentLruBuilder), mitigating cache stampede.
ConcurrentLru now has configurable hot, warm and cold queue size via ICapacityPartition. Default partition scheme changed from equal to 80% warm via FavorWarmPartition, improving hit rate across all tests.
Fixed ConcurrentLru warmup, allowing items to enter the warm queue until warm is full. Improves hit rate across all tests.
Added hit rate analysis tests for real world traces from Wikibench, ARC and glimpse workloads. This replicates the test suite used for Java's Caffeine.
Async get methods now return ValueTask, reducing memory allocations.
Added eviction count to cache metrics

Full changelog: v1.1.0...v2.0.0

Assets 5

01 Jul 01:08

bitfaster

v1.1.0

4100c50

v1.1.0

What's Changed

Added Trim(int itemCount) to ICache and all derived classes
Added TrimExpired() to TLRU classes
When an item is updated TLRU classes reset the item timestamp, extending TTL
Fixed TLRU long ticks policy on macos
Add Cleared and Trimmed to ItemRemovedReason
Item removal event that fires on Clear() now has ItemRemovedReason.Cleared instead of Evicted

Full changelog: v1.0.7...v1.1.0

Assets 5

13 Feb 01:07

bitfaster

v1.0.7

e52e4eb

v1.0.7

Added diagnostic features to dump cache contents:

ClassicLru/ConcurrentLru family implements IEnumerable<KeyValuePair<K,V>>. Enables enumeration of keys and values in the cache.
ClassicLru/ConcurrentLru family implements ICollection Keys. Enables enumeration of the keys in the cache.

Assets 5

Releases: bitfaster/BitFaster.Caching

v2.3.0

What's changed

Uh oh!

v2.2.1

What's changed

Uh oh!

v2.2.0

What's changed

Uh oh!

v2.1.3

What's changed

Uh oh!

v2.1.2

What's changed

Uh oh!

v2.1.1

What's changed

Uh oh!

v2.1.0

What's Changed

Uh oh!

v2.0.0

What's Changed

Uh oh!

v1.1.0

What's Changed

Uh oh!

v1.0.7

Uh oh!