Skip to content

[CORE-16405] cl/test: uncap target consumer reads for compacted syncs#31001

Open
andrwng wants to merge 1 commit into
redpanda-data:devfrom
andrwng:core-16405
Open

[CORE-16405] cl/test: uncap target consumer reads for compacted syncs#31001
andrwng wants to merge 1 commit into
redpanda-data:devfrom
andrwng:core-16405

Conversation

@andrwng

@andrwng andrwng commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

The target consumer carried a fixed read-count cap (max_msgs=msg_count) while the source consumer did not. Under compaction, workload completion is gated on per-partition offset parity (max_offsets_match), which a bounded read cannot reach: rebalance re-reads spend the budget, and a partition whose leadership churns has its tail produced and replicated late, so the consumer stops below max_offsets_produced and the workload never finishes reading to the HWM despite replication finishing.

This commit drops the cap on the compacted path so the target consumer tails to parity like the source consumer; the non-compacted path keeps its count-based cap unchanged.

To avoid this in the future, we also now reject max_msgs with use_compaction up front so the incompatibility fails fast instead of stalling for the progress timeout.

This attempts to fix ShadowLinkingRandomOpsTest.test_node_operations, which is quite flaky.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • None

The target consumer carried a fixed read-count cap (max_msgs=msg_count)
while the source consumer did not. Under compaction, workload completion
is gated on per-partition offset parity (max_offsets_match), which a
bounded read cannot reach: rebalance re-reads spend the budget, and a
partition whose leadership churns has its tail produced and replicated
late, so the consumer stops below max_offsets_produced and the workload
never finishes reading to the HWM despite replication finishing.

This commit drops the cap on the compacted path so the target consumer
tails to parity like the source consumer; the non-compacted path keeps
its count-based cap unchanged.

To avoid this in the future, we also now reject max_msgs with
use_compaction up front so the incompatibility fails fast instead of
stalling for the progress timeout.

This attempts to fix ShadowLinkingRandomOpsTest.test_node_operations,
which is quite flaky.
Copilot AI review requested due to automatic review settings July 2, 2026 18:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the cluster linking workload verifier’s target consumer behavior under compaction so it can read until per-partition offset parity (instead of stopping early due to a message-count cap), addressing hangs/flakiness when compaction and rebalances/leadership churn occur.

Changes:

  • Add a guard rejecting consumer_properties["max_msgs"] when use_compaction=True to fail fast instead of stalling.
  • Remove the target consumer’s fixed max_msgs=self.msg_count cap when use_compaction=True (keep the cap for the non-compacted path).
Comments suppressed due to low confidence (1)

tests/rptest/tests/cluster_linking_test_base.py:299

  • max_msgs is passed explicitly to KgoVerifierConsumerGroupConsumer and **self.consumer_properties is expanded into the same call. If consumer_properties ever includes max_msgs (even None), this will raise TypeError: got multiple values for keyword argument 'max_msgs'. Consider copying and popping max_msgs out of the properties before expanding them.
            group_name=f"target-cg-{self._instance_id}",
            nodes=self.preallocated_nodes,
            continuous=True,
            **self.consumer_properties,
        )

Comment on lines +235 to +241
# When using compaction, the completion criteria examines per-partition
# offsets, which may be at odds with having a max_msgs set.
assert not (self.use_compaction and "max_msgs" in self.consumer_properties), (
"max_msgs is incompatible with use_compaction: completion requires "
"per-partition offset parity, which a bounded read may never reach. "
"Let the consumer tail (continuous) instead."
)
@andrwng

andrwng commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

/ci-repeat 5
skip-redpanda-build
skip-units
skip-rebase
dt-repeat=10
tests/rptest/tests/shadow_linking_rnot_test.py::ShadowLinkingRandomOpsTest.test_node_operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants