[CORE-16405] cl/test: uncap target consumer reads for compacted syncs#31001
Open
andrwng wants to merge 1 commit into
Open
[CORE-16405] cl/test: uncap target consumer reads for compacted syncs#31001andrwng wants to merge 1 commit into
andrwng wants to merge 1 commit into
Conversation
The target consumer carried a fixed read-count cap (max_msgs=msg_count) while the source consumer did not. Under compaction, workload completion is gated on per-partition offset parity (max_offsets_match), which a bounded read cannot reach: rebalance re-reads spend the budget, and a partition whose leadership churns has its tail produced and replicated late, so the consumer stops below max_offsets_produced and the workload never finishes reading to the HWM despite replication finishing. This commit drops the cap on the compacted path so the target consumer tails to parity like the source consumer; the non-compacted path keeps its count-based cap unchanged. To avoid this in the future, we also now reject max_msgs with use_compaction up front so the incompatibility fails fast instead of stalling for the progress timeout. This attempts to fix ShadowLinkingRandomOpsTest.test_node_operations, which is quite flaky.
Contributor
There was a problem hiding this comment.
Pull request overview
Adjusts the cluster linking workload verifier’s target consumer behavior under compaction so it can read until per-partition offset parity (instead of stopping early due to a message-count cap), addressing hangs/flakiness when compaction and rebalances/leadership churn occur.
Changes:
- Add a guard rejecting
consumer_properties["max_msgs"]whenuse_compaction=Trueto fail fast instead of stalling. - Remove the target consumer’s fixed
max_msgs=self.msg_countcap whenuse_compaction=True(keep the cap for the non-compacted path).
Comments suppressed due to low confidence (1)
tests/rptest/tests/cluster_linking_test_base.py:299
max_msgsis passed explicitly toKgoVerifierConsumerGroupConsumerand**self.consumer_propertiesis expanded into the same call. Ifconsumer_propertiesever includesmax_msgs(evenNone), this will raiseTypeError: got multiple values for keyword argument 'max_msgs'. Consider copying and poppingmax_msgsout of the properties before expanding them.
group_name=f"target-cg-{self._instance_id}",
nodes=self.preallocated_nodes,
continuous=True,
**self.consumer_properties,
)
Comment on lines
+235
to
+241
| # When using compaction, the completion criteria examines per-partition | ||
| # offsets, which may be at odds with having a max_msgs set. | ||
| assert not (self.use_compaction and "max_msgs" in self.consumer_properties), ( | ||
| "max_msgs is incompatible with use_compaction: completion requires " | ||
| "per-partition offset parity, which a bounded read may never reach. " | ||
| "Let the consumer tail (continuous) instead." | ||
| ) |
Contributor
Author
|
/ci-repeat 5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The target consumer carried a fixed read-count cap (max_msgs=msg_count) while the source consumer did not. Under compaction, workload completion is gated on per-partition offset parity (max_offsets_match), which a bounded read cannot reach: rebalance re-reads spend the budget, and a partition whose leadership churns has its tail produced and replicated late, so the consumer stops below max_offsets_produced and the workload never finishes reading to the HWM despite replication finishing.
This commit drops the cap on the compacted path so the target consumer tails to parity like the source consumer; the non-compacted path keeps its count-based cap unchanged.
To avoid this in the future, we also now reject max_msgs with use_compaction up front so the incompatibility fails fast instead of stalling for the progress timeout.
This attempts to fix ShadowLinkingRandomOpsTest.test_node_operations, which is quite flaky.
Backports Required
Release Notes