Bugfix: deduplicate concurrent resolver cache requests with singleflight.#9365
Open
twoGiants wants to merge 1 commit intotektoncd:mainfrom
Open
Bugfix: deduplicate concurrent resolver cache requests with singleflight.#9365twoGiants wants to merge 1 commit intotektoncd:mainfrom
twoGiants wants to merge 1 commit intotektoncd:mainfrom
Conversation
Collaborator
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Member
Author
|
@waveywaves check out the fix! 😺 |
aThorp96
reviewed
Feb 11, 2026
Member
aThorp96
left a comment
There was a problem hiding this comment.
Haven't gone through the tests yet but the fix looks good to me!
85f2292 to
ffba4f2
Compare
feea6ff to
db4118b
Compare
Member
Author
|
/retest |
vdemeester
reviewed
Feb 16, 2026
vdemeester
reviewed
Feb 16, 2026
Member
|
/retest |
618f491 to
5fefa6d
Compare
Before this change, cache Get and Add were separate operations, creating a TOCTOU race: concurrent requests for the same resource could all miss the cache and each trigger a remote resolution. Merge Get/Add/GetFromCacheOrResolve into a single GetCachedOrResolveFromRemote method on resolverCache that wraps the resolve callback with golang.org/x/sync/singleflight. Only one in-flight resolution per cache key proceeds; all concurrent callers share its result. Other changes in this commit: - Use strings.Builder in generateCacheKey instead of string concatenation - Remove unused resolverType param from ShouldUse - Rework e2e tests to verify caching by counting actual registry GET requests (via logs and metrics) instead of checking resolver log messages - Add multi-replica resolver test (4 replicas, 200 TaskRuns) - Allow -run flag to bypass category filtering in TestMain Issue tektoncd#9364 Signed-off-by: Stanislav Jakuschevskij <stas@two-giants.com>
5fefa6d to
809b23e
Compare
Member
Author
|
We're good, take another look: @vdemeester @aThorp96 I updated the code, answered and marked the comments as resolved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Fix race condition in resolver cache where concurrent requests for the same resource bypass the cache and independently resolve upstream (cache stampede / thundering herd).
What changed
Before this change, cache lookup (
Get) and storage (Add) were separate operations exposed onresolverCache, and the orchestration lived inGetFromCacheOrResolveinoperations.go.Now the entire check-or-resolve flow is a single
GetCachedOrResolveFromRemotemethod onresolverCachethat wraps the resolve callback withgolang.org/x/sync/singleflight. Only one in-flight resolution proceeds per cache key; all concurrent callers share its result. The separateGet,Add,Remove, andGetFromCacheOrResolvepublic methods are removed — the cache surface is now justGetCachedOrResolveFromRemoteandClear.Other changes
strings.BuilderingenerateCacheKeyinstead of repeated string concatenationresolverTypeparameter fromShouldUse/metricsendpoint) instead of checking resolver log messages-runflag to bypass category filtering inTestMainso individual tests can run in isolationtest/README.md@vdemeester also found and fixed a small bug in the e2e test annotations. I wasn't able to run tests in isolation. Now the little fix does the magic 🐱.
Fixes #9364
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes