-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_neon_cli_basics is flaky due to unexpected warning #10381
Comments
I'd suggest checking the timing of storage controller startup vs. pageserver startup -- I think neon_local is meant to do the controller first and wait for it to be ready, maybe that isn't happening. Or, maybe the test is directly controlling individual services and restarting things in the wrong order. |
Thank you for looking at this! I've reduced the test to the following:
and still getting those warnings. It really looks like pageserver starting first. A failed test log contains:
pageserver.log:
storage_controller.log:
|
In fact, the warning can be easily produced just by neon/control_plane/src/bin/neon_local.rs Line 1777 in 47c1640
As JoinSet doc says: The set is not ordered, and the tasks will be returned in the order they complete. |
When trying to run tests with sanitizers enabled on my laptop, I see test_neon_cli_basics failed on each run (on a more performant workstation it fails from time to time) with:
More from pageserver.log:
The same can be seen at CI:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10104/12286480456/index.html#/testresult/387bce70aea2aae3
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9950/12089114170/index.html#/testresult/a79150a64b0cfba
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9878/12011001635/index.html#suites/48b4046d39093f7675bf477e070db277/efbdabcef0dc3dee/history
...
(the latter run was considered failed)
I can see a similar message added to DEFAULT_PAGESERVER_ALLOWED_ERRORS:
The message from test_neon_cli_basics differs in only one component: "init_tenant_mgr" vs "deletion backend".
The text was updated successfully, but these errors were encountered: