-
Couldn't load subscription status.
- Fork 14.7k
MINOR: Prevent re-join flakiness in test_fencing_static_consumer by ensuring conflicting static consumers terminate #20772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
Test command: Before: After: |
| return consumer | ||
|
|
||
| def _node_failed_with_unreleased_instance_id(self, node): | ||
| cmd = "grep -q 'UnreleasedInstanceIdException' %s" % VerifiableConsumer.LOG_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the error is subject to change, would it be more reliable to check the process ID (PID) directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
Given backward compatibility, checking the PID makes more sense.
I’ve made the corresponding adjustment.
|
Reran the test to reflect the recent changes: Also tested other parameter combinations, and they were not affected. |
Related discussion:
#20594 (review)
Problem
The test
OffsetValidationTest.test_fencing_static_consumerfailed whenexecuted with
fencing_stage=stableandgroup_protocol=consumer.It timed out while waiting for the group to become empty because the
conflicting static consumers re-joined after the original members
stopped, keeping the group non-empty and causing the timeout.
Fix
For the consumer-protocol path, the test now waits for all conflicting
consumer processes to terminate before stopping the original static
members. This ensures that each conflicting consumers is fully fenced
and cannot re-join the group after the original members stop.