Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timeouts not working when not LEADING or FOLLOWING #2057

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

tylerkaraszewski
Copy link
Contributor

Details

When a server is neither LEADING nor FOLLOWING we don't check for command timeouts. Normally this doesn't matter, because a server should become LEADING or FOLLOWING very soon, but we run into this when shutting down clusters in tests. If you are the last node left in a cluster, you won't have quorum, and thus cannot process commands.

This is why we've changed BedrockTester to take a timeout in the same PR, because we do this in auth:
https://github.com/Expensify/Auth/blob/0751f4eefd49945a009cc219152fe3b51a684502/test/test/AuthTester.cpp#L219

Because when this is done via Query command (as it is for HC-Tree) we can send a Query to a node that is never able to process it, and the node will never shut down.

I think in production, this is relatively unlikely to occur, but I still think it's "more correct" to honor timeouts in this state.

There are some particulars here about not having a DB handle when we do this, which is generally required to timeout commands because we attach a CommitCount message to even timed out command responses. This skips attaching that line in this particular case, and changes how some parameters are passed/stored/etc in BedrockCore to accomodate not always having an SQLite object available.

Fixed Issues

Found while working on https://github.com/Expensify/Expensify/issues/337537

Tests

Existing


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

@tylerkaraszewski tylerkaraszewski self-assigned this Jan 10, 2025
@tylerkaraszewski tylerkaraszewski changed the title Fix timeouts not worknig when not leading or follwoing Fix timeouts not worknig when not LEADING or FOLLOWING Jan 10, 2025
@tylerkaraszewski tylerkaraszewski changed the title Fix timeouts not worknig when not LEADING or FOLLOWING Fix timeouts not working when not LEADING or FOLLOWING Jan 10, 2025
@tylerkaraszewski tylerkaraszewski merged commit 4412c73 into main Jan 10, 2025
1 check passed
@tylerkaraszewski tylerkaraszewski deleted the tyler-fix-timeout-not-leading-following branch January 10, 2025 22:19
@pecanoro
Copy link
Contributor

Manually commenting since the deploy script is broken, this was deployed to webrock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants