Skip to content

xdsclient: fix unexpectedly large LoadReportInterval in initial load report request #8348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 26, 2025

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented May 22, 2025

Internal bug: b/416260484

The lastReportedAt field is uninitialized during reporter construction, defaulting to the Zero timestamp. This results in an excessively large LoadReportInterval (calculated as Now() - lastReportedAt) for the initial load report. Initializing lastReportedAt to Now() during construction will ensure the LoadReportInterval is accurately approximated for the first report.

RELEASE NOTES:

  • xdsclient: fix unexpectedly large LoadReportInterval in initial load report request.

@purnesh42H purnesh42H requested review from dfawley and easwars and removed request for dfawley May 22, 2025 07:40
@purnesh42H purnesh42H requested a review from dfawley May 22, 2025 07:40
@purnesh42H purnesh42H changed the title xdsclient: fix unexpectedly large in initial load report request xdsclient: fix unexpectedly large LoadReportInterval in initial load report request May 22, 2025
Copy link

codecov bot commented May 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.29%. Comparing base (6995ef2) to head (6bdf940).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8348      +/-   ##
==========================================
+ Coverage   82.16%   82.29%   +0.13%     
==========================================
  Files         419      419              
  Lines       42034    42052      +18     
==========================================
+ Hits        34537    34607      +70     
+ Misses       6023     5987      -36     
+ Partials     1474     1458      -16     
Files with missing lines Coverage Δ
xds/internal/clients/lrsclient/load_store.go 93.19% <100.00%> (+0.10%) ⬆️
xds/internal/xdsclient/load/store.go 86.47% <100.00%> (+0.06%) ⬆️

... and 29 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@purnesh42H purnesh42H added this to the 1.73 Release milestone May 22, 2025
@purnesh42H purnesh42H added the Area: xDS Includes everything xDS related, including LB policies used with xDS. label May 22, 2025
Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test for this to prevent regressions?

@purnesh42H
Copy link
Contributor Author

Can you please add a test for this to prevent regressions?

we can only check if LoadReportInterval is positive but not excessively large. We can't compare exactly since calculation involves time.Now(). I have added test verifications in the existing e2e tests with some tolerance to ensure reportInterval of the first report and subsequent reports is a small, positive duration but greater than configured server LoadReportingInterval.

@purnesh42H purnesh42H requested a review from arjan-bal May 22, 2025 10:02
@purnesh42H purnesh42H removed their assignment May 22, 2025
@dfawley
Copy link
Member

dfawley commented May 22, 2025

we can only check if LoadReportInterval is positive but not excessively large. We can't compare exactly since calculation involves time.Now().

Without looking at the details here... Can we hook time.Now and make it return predictable values for our tests?

@purnesh42H purnesh42H force-pushed the load-store-init-last-report-at branch from 012a5bc to 6a3ebaf Compare May 23, 2025 08:57
@purnesh42H
Copy link
Contributor Author

purnesh42H commented May 23, 2025

we can only check if LoadReportInterval is positive but not excessively large. We can't compare exactly since calculation involves time.Now().

Without looking at the details here... Can we hook time.Now and make it return predictable values for our tests?

Added the hook. Thought that limits us to only verify with predictable report interval in the unit test of the load store and not in the e2e tests i.e. verifying the report interval that got reported to LRS server but that's probably fine.

@purnesh42H purnesh42H force-pushed the load-store-init-last-report-at branch from 6a3ebaf to 65fd12d Compare May 23, 2025 13:42
@purnesh42H purnesh42H force-pushed the load-store-init-last-report-at branch from 65fd12d to b7c391d Compare May 23, 2025 13:44
@dfawley
Copy link
Member

dfawley commented May 23, 2025

Thought that limits us to only verify with predictable report interval in the unit test of the load store and not in the e2e tests

If needed, you can hop through an internal package to be able to set from e2e tests.

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple very minor things.

Comment on lines 501 to 502
wantInterval := 5 * time.Second
if stats1.reportInterval != wantInterval {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional:

I think the best way I've been recommended of writing this kind of test comparison is:

if got, want := stats1.reportInterval, 5 * time.Second; got != want {
	t.Fatalf("blah blah = %v; want %v", got, want)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -25,6 +25,9 @@ import (
"time"
)

// clockNow is used to get the current time. It can be overridden in tests.
var clockNow = time.Now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nit, but would you mind renaming this to timeNow to match packagename.SymbolName but without the dot? That's the convention we generally use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -25,6 +25,9 @@ import (

const negativeOneUInt64 = ^uint64(0)

// clockNow is used to get the current time. It can be overridden in tests.
var clockNow = time.Now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

currentTime = currentTime.Add(5 * time.Second)
stats1 := store.Stats(nil)

if stats1 == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you want even more from this check, since it's a slice that you dereference element 0 of immediately below.

if len(stats1) == 0 { // or != 1 ?
	t.Fatalf("store.Stats(nil) = %v; want len() >= 1", stats1)  // or == 1?
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dfawley dfawley assigned purnesh42H and unassigned easwars and dfawley May 23, 2025
@purnesh42H
Copy link
Contributor Author

Thought that limits us to only verify with predictable report interval in the unit test of the load store and not in the e2e tests

If needed, you can hop through an internal package to be able to set from e2e tests.

I did it for the generic lrs client tests but left it for the internal one since that will be external soon.

@purnesh42H purnesh42H force-pushed the load-store-init-last-report-at branch from ef2a9f6 to 6bdf940 Compare May 26, 2025 04:12
@purnesh42H purnesh42H merged commit e3ca7f9 into grpc:master May 26, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants