Skip to content

Reduce allocations in FeatureFlags.IsEnabled #11076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

kshyju
Copy link
Member

@kshyju kshyju commented May 22, 2025

Fixes #10883

Optimizes memory allocations in FeatureFlags.IsEnabled by adopting the zero-allocation StringUtils.ContainsToken utility for efficient token search in delimited strings.

This method gets called many times during specialization.

Allocation Snapshot

I tested with a non-empty value for AzureWebJobsFeatureFlags.

"AzureWebJobsFeatureFlags": "EnableWorkerIndexing"

Allocation Comparison

Before

Name Total Allocations Self Allocations Total Size (Bytes) Self Size (Bytes)
FeatureFlags.IsEnabled 645 0 27,090 0

After

Name Total Allocations Self Allocations Total Size (Bytes) Self Size (Bytes)
FeatureFlags.IsEnabled 215 0 13,330 0

Delta

  • FeatureFlags.IsEnabled(string, IEnvironment) – Total allocations dropped by 430 (66.67%)
  • FeatureFlags.IsEnabled(string, IEnvironment) – Total allocated size dropped by 13,760 bytes (50.80%)

Benchmark.NET Micro Benchmarks

Also ran micro benchmark tests to compare the performance of the new implementation with the old one.
source code used: https://github.com/kshyju/Benchmarks/blob/9fe638a43458591991317a72af4b7b7cd30ed6a9/src/Benchmarks.ConsoleApp/StringSplitBenchmarks.cs#L6

Windows

Benchmark Report

Method Mean Allocated
ContainsUsingStringSplit_Short_Present 41.45 ns 64 B
ContainsToken_Short_Present 11.52 ns -
ContainsUsingStringSplit_Short_Absent 44.09 ns 64 B
ContainsToken_Short_Absent 10.59 ns -
ContainsUsingStringSplit_Medium_Present 121.10 ns 272 B
ContainsToken_Medium_Present 31.06 ns -
ContainsUsingStringSplit_Medium_Absent 118.37 ns 280 B
ContainsToken_Medium_Absent 32.21 ns -
ContainsUsingStringSplit_Empty 0.33 ns -
ContainsToken_Empty 1.59 ns -

Linux (Ubuntu)

Benchmark Report

Method Mean Allocated
ContainsUsingStringSplit_Short_Present 39.69 ns 64 B
ContainsToken_Short_Present 10.58 ns -
ContainsUsingStringSplit_Short_Absent 42.72 ns 64 B
ContainsToken_Short_Absent 10.63 ns -
ContainsUsingStringSplit_Medium_Present 135.73 ns 272 B
ContainsToken_Medium_Present 31.16 ns -
ContainsUsingStringSplit_Medium_Absent 138.68 ns 280 B
ContainsToken_Medium_Absent 32.59 ns -
ContainsUsingStringSplit_Empty 0.35 ns -
ContainsToken_Empty 1.26 ns -

Pull request checklist

IMPORTANT: Currently, changes must be backported to the in-proc branch to be included in Core Tools and non-Flex deployments.

  • Backporting to the in-proc branch is not required
    • Otherwise: Link to backporting PR
  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • My changes do not require diagnostic events changes
    • Otherwise: I have added/updated all related diagnostic events and their documentation (Documentation issue linked to PR)
  • I have added all required tests (Unit tests, E2E tests)

Additional information

Additional PR information

@kshyju kshyju requested a review from a team as a code owner May 22, 2025 22:50
@kshyju kshyju requested a review from safihamid May 22, 2025 22:50
public sealed class StringUtilsTests
{
[Theory]
[InlineData("FeatureA,FeatureB,FeatureC", "FeatureB", ',', true)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we account for whitespace and empty tokens?

"FeatureA, FeatureB,FeatureC"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally left that out to maintain parity with the previous version, which also didn’t handle trimming. In my benchmark repository, I’ve included a unit test that compares the outputs of both implementations to ensure the functionality remains consistent.

https://github.com/kshyju/Benchmarks/blob/9fe638a43458591991317a72af4b7b7cd30ed6a9/tests/Benchmarks.Tests/Tests.cs#L17-L26


if (separatorIndex >= 0)
{
currentToken = remaining.Slice(0, separatorIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing currentToken and searchToken here should save some nanoseconds. You can skip a slice if currentToken matches the target.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I tried that approach and ran benchmarks to compare those 2.

Linux

https://github.com/kshyju/Benchmarks/actions/runs/15201876152/job/42757382146#step:7:1378

Method Mean
ContainsToken_Short_Present 11.548 ns
ContainsToken_CheckInline_Short_Present 10.645 ns
ContainsToken_Short_Absent 10.644 ns
ContainsToken_CheckInline_Short_Absent 10.330 ns
ContainsToken_Medium_Present 31.076 ns
ContainsToken_CheckInline_Medium_Present 31.335 ns
ContainsToken_Medium_Absent 32.303 ns
ContainsToken_CheckInline_Medium_Absent 32.730 ns
ContainsToken_Empty 1.332 ns
ContainsToken_CheckInline_Empty 1.338 ns
Windows

https://github.com/kshyju/Benchmarks/actions/runs/15201876152/job/42757382306#step:7:1387

Method Mean
ContainsToken_Short_Present 11.196 ns
ContainsToken_CheckInline_Short_Present 10.909 ns
ContainsToken_Short_Absent 10.588 ns
ContainsToken_CheckInline_Short_Absent 10.291 ns
ContainsToken_Medium_Present 30.436 ns
ContainsToken_CheckInline_Medium_Present 36.290 ns
ContainsToken_Medium_Absent 31.417 ns
ContainsToken_CheckInline_Medium_Absent 30.823 ns
ContainsToken_Empty 1.586 ns
ContainsToken_CheckInline_Empty 1.593 ns

Comparing them:

Platform Input Match Variable (ns) Inline (ns) Δ (ns) Winner
Linux Short Present 11.55 10.65 0.90 Inline
Linux Short Absent 10.64 10.33 0.31 Inline
Linux Medium Present 31.08 31.34 -0.26 Variable
Linux Medium Absent 32.30 32.73 -0.43 Variable
Linux Empty 1.33 1.34 -0.01 Tie
Windows Short Present 11.20 10.91 0.29 Inline
Windows Short Absent 10.59 10.29 0.30 Inline
Windows Medium Present 30.44 36.29 -5.85 Variable
Windows Medium Absent 31.42 30.82 0.60 Inline
Windows Empty 1.59 1.59 0.00 Tie

TLDR:

  • The inline version is slightly faster for short inputs, but the difference is typically less than 1 nanosecond.
  • For medium and empty inputs, performance is nearly identical, with the variable version sometimes ahead.

I personally prefer the variable version (currently in the PR) since I find it more readable with 2 sections. The top section snippet handles token extraction, and the bottom handles the comparison. Do you find the inline version more readable? I am open to switching to that if more people find that readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't expect inline version to take longer compared to the variable version especially when there's a match (we will save one slice). I find the variable version more readable as you would need another level of nesting in the inline version. I am fine if you want to keep your existing version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran your test locally, inline is promising in all the scenarios.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I expect results from local dev workstation to have higher variance/noise (std dev) than the one on servers. From all these 3 results, looks like the Linux server has the least variance. I still vote for readability over 5 ns.

Copy link
Contributor

@safihamid safihamid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize FeatureFlags.IsEnabled
3 participants