-
Notifications
You must be signed in to change notification settings - Fork 457
Reduce allocations in FeatureFlags.IsEnabled #11076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
public sealed class StringUtilsTests | ||
{ | ||
[Theory] | ||
[InlineData("FeatureA,FeatureB,FeatureC", "FeatureB", ',', true)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we account for whitespace and empty tokens?
"FeatureA, FeatureB,FeatureC"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally left that out to maintain parity with the previous version, which also didn’t handle trimming. In my benchmark repository, I’ve included a unit test that compares the outputs of both implementations to ensure the functionality remains consistent.
|
||
if (separatorIndex >= 0) | ||
{ | ||
currentToken = remaining.Slice(0, separatorIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing currentToken and searchToken here should save some nanoseconds. You can skip a slice if currentToken matches the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I tried that approach and ran benchmarks to compare those 2.
Linux
https://github.com/kshyju/Benchmarks/actions/runs/15201876152/job/42757382146#step:7:1378
Method | Mean |
---|---|
ContainsToken_Short_Present | 11.548 ns |
ContainsToken_CheckInline_Short_Present | 10.645 ns |
ContainsToken_Short_Absent | 10.644 ns |
ContainsToken_CheckInline_Short_Absent | 10.330 ns |
ContainsToken_Medium_Present | 31.076 ns |
ContainsToken_CheckInline_Medium_Present | 31.335 ns |
ContainsToken_Medium_Absent | 32.303 ns |
ContainsToken_CheckInline_Medium_Absent | 32.730 ns |
ContainsToken_Empty | 1.332 ns |
ContainsToken_CheckInline_Empty | 1.338 ns |
Windows
https://github.com/kshyju/Benchmarks/actions/runs/15201876152/job/42757382306#step:7:1387
Method | Mean |
---|---|
ContainsToken_Short_Present | 11.196 ns |
ContainsToken_CheckInline_Short_Present | 10.909 ns |
ContainsToken_Short_Absent | 10.588 ns |
ContainsToken_CheckInline_Short_Absent | 10.291 ns |
ContainsToken_Medium_Present | 30.436 ns |
ContainsToken_CheckInline_Medium_Present | 36.290 ns |
ContainsToken_Medium_Absent | 31.417 ns |
ContainsToken_CheckInline_Medium_Absent | 30.823 ns |
ContainsToken_Empty | 1.586 ns |
ContainsToken_CheckInline_Empty | 1.593 ns |
Comparing them:
Platform | Input | Match | Variable (ns) | Inline (ns) | Δ (ns) | Winner |
---|---|---|---|---|---|---|
Linux | Short | Present | 11.55 | 10.65 | 0.90 | Inline |
Linux | Short | Absent | 10.64 | 10.33 | 0.31 | Inline |
Linux | Medium | Present | 31.08 | 31.34 | -0.26 | Variable |
Linux | Medium | Absent | 32.30 | 32.73 | -0.43 | Variable |
Linux | Empty | — | 1.33 | 1.34 | -0.01 | Tie |
Windows | Short | Present | 11.20 | 10.91 | 0.29 | Inline |
Windows | Short | Absent | 10.59 | 10.29 | 0.30 | Inline |
Windows | Medium | Present | 30.44 | 36.29 | -5.85 | Variable |
Windows | Medium | Absent | 31.42 | 30.82 | 0.60 | Inline |
Windows | Empty | — | 1.59 | 1.59 | 0.00 | Tie |
TLDR:
- The inline version is slightly faster for short inputs, but the difference is typically less than 1 nanosecond.
- For medium and empty inputs, performance is nearly identical, with the variable version sometimes ahead.
I personally prefer the variable version (currently in the PR) since I find it more readable with 2 sections. The top section snippet handles token extraction, and the bottom handles the comparison. Do you find the inline version more readable? I am open to switching to that if more people find that readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I didn't expect inline version to take longer compared to the variable version especially when there's a match (we will save one slice). I find the variable version more readable as you would need another level of nesting in the inline version. I am fine if you want to keep your existing version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I expect results from local dev workstation to have higher variance/noise (std dev) than the one on servers. From all these 3 results, looks like the Linux server has the least variance. I still vote for readability over 5 ns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #10883
Optimizes memory allocations in
FeatureFlags.IsEnabled
by adopting the zero-allocationStringUtils.ContainsToken
utility for efficient token search in delimited strings.This method gets called many times during specialization.
I tested with a non-empty value for
AzureWebJobsFeatureFlags
.Allocation Comparison
Before
FeatureFlags.IsEnabled
After
FeatureFlags.IsEnabled
Delta
FeatureFlags.IsEnabled(string, IEnvironment)
– Total allocations dropped by 430 (66.67%)FeatureFlags.IsEnabled(string, IEnvironment)
– Total allocated size dropped by 13,760 bytes (50.80%)Benchmark.NET Micro Benchmarks
Also ran micro benchmark tests to compare the performance of the new implementation with the old one.
source code used: https://github.com/kshyju/Benchmarks/blob/9fe638a43458591991317a72af4b7b7cd30ed6a9/src/Benchmarks.ConsoleApp/StringSplitBenchmarks.cs#L6
Windows
Benchmark Report
ContainsUsingStringSplit_Short_Present
ContainsToken_Short_Present
ContainsUsingStringSplit_Short_Absent
ContainsToken_Short_Absent
ContainsUsingStringSplit_Medium_Present
ContainsToken_Medium_Present
ContainsUsingStringSplit_Medium_Absent
ContainsToken_Medium_Absent
ContainsUsingStringSplit_Empty
ContainsToken_Empty
Linux (Ubuntu)
Benchmark Report
ContainsUsingStringSplit_Short_Present
ContainsToken_Short_Present
ContainsUsingStringSplit_Short_Absent
ContainsToken_Short_Absent
ContainsUsingStringSplit_Medium_Present
ContainsToken_Medium_Present
ContainsUsingStringSplit_Medium_Absent
ContainsToken_Medium_Absent
ContainsUsingStringSplit_Empty
ContainsToken_Empty
Pull request checklist
IMPORTANT: Currently, changes must be backported to the
in-proc
branch to be included in Core Tools and non-Flex deployments.in-proc
branch is not requiredrelease_notes.md
Additional information
Additional PR information