Skip to content

rptest: add skip_in_cdt opt-out for CDT runs#30975

Draft
oleiman wants to merge 14 commits into
devfrom
ci/noticket/opt-out-cdt
Draft

rptest: add skip_in_cdt opt-out for CDT runs#30975
oleiman wants to merge 14 commits into
devfrom
ci/noticket/opt-out-cdt

Conversation

@oleiman

@oleiman oleiman commented Jun 30, 2026

Copy link
Copy Markdown
Member

CDT runs every ducktape test on real cloud infrastructure, and a whole-run timeout fails the entire run. Some tests (e.g. exhaustive broker HTTP-API tests) add runtime there without adding cloud-infra coverage.

skip_in_cdt opts out a test method or a whole class; skip_file_in_cdt opts out every test class in a module. All are gated on CLOUD_PROVIDER != "docker", so they no-op locally and in dockerized CI and the default behavior is unchanged. In CDT they attach ducktape's ignore mark, so opted-out tests report as IGNORE rather than vanishing from the run.

reason is documentation only; it is never passed to ducktape's ignore, which would otherwise treat it as a parametrization matcher and silently un-skip the test.

Why these are CDT-skippable: each is a no-cloud, non-compute, non-scale suite — a pure API/CLI/protocol/control-plane
surface whose behavior is identical on docker vs real EC2+S3. All already run on every PR in dockerized CI, so running
them again in CDT adds runtime but no coverage. (Compute/IO suites like data_transforms and log_compaction, and all
scale_tests/, are deliberately kept — CDT is where they exercise real hardware.)
  • schema_registry_test — Schema Registry HTTP API; protocol surface, no cloud/hardware dependence.
  • pandaproxy_test — Pandaproxy HTTP API; same — pure REST surface.
  • redpanda_oauth_test — OAuth/OIDC auth flows; environment-independent.
  • acls_test — ACL authorization enforcement; logic-only, identical off-cloud.
  • scram_test — SASL/SCRAM auth; protocol/logic path, no cloud.
  • rbac_test — role-based access control; config/logic, env-independent.
  • gbac_claim_test — group access via OIDC claims; auth logic, no cloud.
  • security_report_test — security-report surface across interfaces; API-level.
  • audit_log_test — audit-event emission correctness; behavior-level, no storage/hardware sensitivity.
  • enterprise_features_license_test — license enforcement; gates on a sample-license flag, no cloud infra.
  • cluster_features_test — feature-flag activation state machine; control-plane logic.
  • maintenance_test — maintenance-mode drain/leadership via admin API; coordination logic.
  • leadership_transfer_test — leadership transfer/rebalance correctness; raft coordination, no storage or heavy compute.

Accounting

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

@oleiman oleiman self-assigned this Jun 30, 2026
@oleiman oleiman requested a review from Copilot June 30, 2026 22:23
@oleiman oleiman added the claude-review Adding this label to a PR will trigger a workflow to review the code using claude. label Jun 30, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-out mechanism for ducktape tests to be skipped specifically in CDT (real cloud) runs, keeping local/docker CI behavior unchanged while still reporting skipped tests as IGNORE in CDT.

Changes:

  • Introduces in_cdt(), skip_in_cdt (method/class decorator), and skip_file_in_cdt (module-level helper) in mode_checks.py.
  • Adds unit tests validating method-, class-, and file-scope skipping behavior and ensuring reason is not forwarded to ducktape’s ignore.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/rptest/utils/mode_checks.py Adds CDT detection and skip helpers that apply ducktape IGNORE marks only when running in real-cloud CDT.
tests/rptest/utils/skip_in_cdt_test.py Adds unit tests for method/class/file scope CDT skipping and verifies ignore mark semantics.

Comment thread tests/rptest/utils/mode_checks.py
@oleiman oleiman added claude-review Adding this label to a PR will trigger a workflow to review the code using claude. and removed claude-review Adding this label to a PR will trigger a workflow to review the code using claude. labels Jun 30, 2026
@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review

Nice, focused utility — gating everything on CLOUD_PROVIDER != "docker" so it no-ops locally and in dockerized CI is the right call, and using ducktape's ignore (so opted-out tests report as IGNORE rather than vanishing) is a good design choice. The unit test coverage for method/class/file scope, plus the regression guard that reason is not forwarded to ignore, is solid.

Main concern: class/file scope misses inherited test methods

_skip_class_in_cdt iterates vars(cls), which only contains attributes defined directly on the class. A very common rptest pattern is a base class that defines the tests and thin subclasses that only parametrize (e.g. TLSVersionTestBaseTLSVersionTestRSA/TLSVersionTestECDSA in tls_version_test.py). Applying @skip_in_cdt to such a subclass marks nothing, so the test still runs in CDT — the exact failure this PR aims to prevent, and silent (no error). skip_file_in_cdt shares the gap when the base class is imported from another module. Details and a suggested fix (walk the MRO) are in the inline comment.

Minor notes

  • In-place-mutation reliance: _skip_method_in_cdt calls ignore(method) and discards the return value, relying on ignore mutating in place, while _skip_class_in_cdt uses setattr(..., ignore(member)). Both work given ducktape's ignore both mutates and returns the same object, but the two helpers are inconsistent — worth aligning for clarity.
  • reason is invisible at runtime: it's stored on the target but never logged, so a CDT IGNORE report won't show why a test was skipped. Not blocking, but a short vlog/print at decoration time (or surfacing it in the ignore output) would aid triage. The PR body already explains why it can't be passed to ignore directly — good.
  • Per CLAUDE.md, run tools/type-checking/type-check.sh check utils/mode_checks.py utils/skip_in_cdt_test.py on the changed files before merge.

Overall the approach is sound; the inherited-test-method gap is the one item I'd want addressed (or at least loudly documented) before this lands, since it can silently fail to skip.

Comment thread tests/rptest/utils/mode_checks.py Outdated
@oleiman oleiman force-pushed the ci/noticket/opt-out-cdt branch from c2a250c to d99bbc5 Compare July 1, 2026 06:35
@oleiman

oleiman commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/cdt
rp_version=build
tests/rptest/tests/schema_registry_test.py

@oleiman

oleiman commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/tests/audit_log_test.py
tests/rptest/tests/acls_test.py
tests/rptest/tests/scram_test.py
tests/rptest/tests/enterprise_features_license_test.py
tests/rptest/tests/redpanda_oauth_test.py
tests/rptest/tests/pandaproxy_test.py
tests/rptest/tests/data_transforms_test.py
tests/rptest/tests/gbac_claim_test.py
tests/rptest/tests/cluster_features_test.py
tests/rptest/tests/rbac_test.py
tests/rptest/tests/log_compaction_test.py
tests/rptest/tests/security_report_test.py
tests/rptest/tests/maintenance_test.py
tests/rptest/tests/leadership_transfer_test.py

@oleiman

oleiman commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/tests/schema_registry_test.py

@oleiman oleiman force-pushed the ci/noticket/opt-out-cdt branch from 2379a47 to d99bbc5 Compare July 2, 2026 02:24
@oleiman

oleiman commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/tests/acls_test.py
tests/rptest/tests/audit_log_test.py
tests/rptest/tests/cluster_features_test.py
tests/rptest/tests/enterprise_features_license_test.py
tests/rptest/tests/gbac_claim_test.py
tests/rptest/tests/leadership_transfer_test.py
tests/rptest/tests/maintenance_test.py
tests/rptest/tests/pandaproxy_test.py
tests/rptest/tests/rbac_test.py
tests/rptest/tests/redpanda_oauth_test.py
tests/rptest/tests/schema_registry_test.py
tests/rptest/tests/scram_test.py
tests/rptest/tests/security_report_test.py

@oleiman oleiman force-pushed the ci/noticket/opt-out-cdt branch from d99bbc5 to 097f0ad Compare July 2, 2026 21:44
@oleiman oleiman requested a review from Copilot July 2, 2026 21:45
@oleiman oleiman added claude-review Adding this label to a PR will trigger a workflow to review the code using claude. and removed claude-review Adding this label to a PR will trigger a workflow to review the code using claude. labels Jul 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.

Comment thread tests/rptest/utils/mode_checks.py
Comment thread tests/rptest/utils/skip_in_cdt_test.py
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review: skip_in_cdt CDT opt-out

Reviewed the mechanism in mode_checks.py, the unit tests, and all 13 opt-out sites. This is a well-designed, well-documented change. The in_cdt() gating (default "docker", mirroring the existing get_cloud_provider() env read) keeps local and dockerized-CI behavior byte-for-byte unchanged, and reporting opted-out tests as IGNORE rather than dropping them is the right call. Deliberately not forwarding reason to ducktape's ignore() (which would misread it as a parametrization matcher and un-skip the test) is a subtle correctness point that's handled well and guarded by a regression assertion.

Verification I did

  • .marks detection works with the real decorator. The unit test uses ducktape's raw cluster, but the actual files use rptest.services.cluster.cluster; that wrapper propagates .marks (services/cluster.py:328), so _mark_own_cdt_tests/skip_file_in_cdt correctly find the test methods. (Left an inline note suggesting the test pin this contract with the wrapper.)
  • No accidental over-skip across modules. Because marks land on the shared function object, I checked every cross-module import of the opted-out files' classes (schema_registry_scale_test, admin_api_auth_test, security_report_test, rpk_registry_test, tls_metrics_test, cluster_linking_*). The imported bases (SchemaRegistryEndpoints, PandaProxyEndpoints, AuditLogTestBase, User, PandaProxyTLSProvider, …) either aren't Test subclasses or expose no @cluster test methods, so no non-opted-out module gets its CDT tests skipped as a side effect.
  • All 13 sites are effective. Each concrete test class inherits its tests from an in-module base (or defines them directly), so skip_file_in_cdt's "mark each module class's own methods" reaches the whole suite. None rely on cross-module inherited tests, so nothing is silently missed here.

Suggestions (non-blocking)

  1. skip_file_in_cdt fails silently when a module's tests are inherited cross-module — unlike the class decorator, which raises. A warnings.warn/log when a matched class has only inherited (cross-module) test methods would make a misapplied opt-out discoverable instead of a quiet regression. (inline)
  2. Test the rptest cluster wrapper path, not just ducktape's raw cluster, so a regression in the .marks propagation that this whole mechanism depends on would be caught. (inline)
  3. Minor: passing reason positionally (@skip_in_cdt("text")) is a sharp edge — it's caught loudly at import (str isn't callable), and the @overloads steer correct usage, so this is cosmetic.

No correctness or security concerns. The two inline notes are hardening for future callers, not blockers.

@oleiman oleiman force-pushed the ci/noticket/opt-out-cdt branch from 097f0ad to 908a606 Compare July 2, 2026 23:31
@oleiman oleiman marked this pull request as ready for review July 2, 2026 23:43
@oleiman oleiman requested a review from Copilot July 2, 2026 23:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comment thread tests/rptest/utils/mode_checks.py
Comment thread tests/rptest/utils/skip_in_cdt_test.py
@vbotbuildovich

vbotbuildovich commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#86689
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ConsumerGroupBalancingTest test_coordinator_nodes_balance null integration https://buildkite.com/redpanda/redpanda/builds/86689#019f253d-4a18-4d0f-97bc-523d4f5f6db6 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0031, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerGroupBalancingTest&test_method=test_coordinator_nodes_balance
test results on build#86695
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingRandomOpsTest test_node_operations {"failures": true, "workload_set": "cloud_combos"} integration https://buildkite.com/redpanda/redpanda/builds/86695#019f2662-cde4-4305-a620-13b3be999a88 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0184, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations

oleiman added 14 commits July 2, 2026 22:01
CDT runs every ducktape test on real cloud infrastructure, and a
whole-run timeout fails the entire run. Some tests (e.g. exhaustive
broker HTTP-API tests) add runtime there without adding cloud-infra
coverage.

skip_in_cdt opts out a test method or a whole class; skip_file_in_cdt
opts out every test class in a module. All are gated on
CLOUD_PROVIDER != "docker", so they no-op locally and in dockerized CI
and the default behavior is unchanged. In CDT they attach ducktape's
ignore mark, so opted-out tests report as IGNORE rather than vanishing
from the run.

reason is documentation only; it is never passed to ducktape's ignore,
which would otherwise treat it as a parametrization matcher and
silently un-skip the test.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
The exhaustive schema-registry HTTP-API tests are the largest single
contributor to a CDT run (hundreds of test instances) and exercise no
cloud infrastructure -- the dockerized CI run already covers them, so
running them again on real cloud only adds wall-clock.

skip_file_in_cdt marks each module class's own test methods. The test
methods live on in-module base classes and the concrete classes inherit
them, so marking the bases covers the whole suite. No-op locally and in
dockerized CI.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the ci/noticket/opt-out-cdt branch from 908a606 to 3e76378 Compare July 3, 2026 05:01
@oleiman oleiman requested a review from Copilot July 3, 2026 05:26
@oleiman oleiman added claude-review Adding this label to a PR will trigger a workflow to review the code using claude. and removed claude-review Adding this label to a PR will trigger a workflow to review the code using claude. labels Jul 3, 2026
Comment on lines +321 to +323
Because that mark lives on the function object itself, a subclass of the same
base in ANOTHER module would also be skipped in CDT (safe as long as opted-out
bases share no test methods with non-opted-out suites). A class here that

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring correctly notes this footgun, but there is an asymmetry worth calling out because it is a genuine silent-failure risk for future work:

  • skip_file_in_cdt warns when a class in this module inherits test methods from another module (_cross_module_cdt_test_methods).
  • There is no guard for the reverse: a base defined here whose methods get marked, then subclassed by a non-opted-out suite in another module. Those inherited tests will silently vanish from CDT with no warning anywhere, because the IGNORE mark rides on the shared function object.

Today this is safe only by accident — SchemaRegistryEndpoints/AuditLogTestBase define no test methods, so the only marked bases with tests (SchemaRegistryTestMethods, etc.) are subclassed exclusively in-module. But if someone later writes class Foo(SchemaRegistryTestMethods) in a new file (e.g. a scale/CDT-only suite), it will be silently skipped in CDT with zero signal.

That reverse direction is fundamentally undetectable at this module's import time, which is exactly why it's dangerous. Consider a follow-up guard that doesn't depend on function-object marks — e.g. an opt-out registry checked by the ducktape loader, or asserting at collection that no marked base is subclassed by an unmarked class. At minimum, please make the "share no test methods with non-opted-out suites" invariant explicit at the call sites, since it's load-bearing.

# coverage (the dockerized CI run already exercises it end to end). The test
# methods live on the base classes above; skip_file_in_cdt marks each module
# class's own methods, so the thin concrete subclasses inherit the mark.
skip_file_in_cdt(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A module-scope opt-out also pulls in SchemaRegistryTransportStressTest (and its *RpcTransport*/*KafkaClientTransport* subclasses), which does concurrent SR read/write while transferring leadership of _schemas to check resilience under real network/timing. That is arguably one of the SR tests where real EC2 (vs. loopback docker) network + scheduling behavior does add signal, unlike the pure HTTP-API surface the PR body describes.

Is opting the transport-stress tests out of CDT intentional? If so, a one-line note in the accounting justifying it would help; if not, consider a method-scope @skip_in_cdt on the HTTP-API methods instead of a whole-file opt-out, so the resilience tests keep their real-hardware coverage.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Comment on lines +298 to +302
if args:
# bare usage: @skip_in_cdt
return decorate(args[0])
# called usage: @skip_in_cdt(reason=...)
return decorate
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review summary

Nicely scoped change with a thoughtfully-designed mechanism and good test coverage. The in_cdt() env-gating means default/local/dockerized-CI behavior is genuinely unchanged, and the decision to attach ducktape's ignore mark (report as IGNORE rather than dropping tests) is the right call for visibility. The reason-not-forwarded-to-ignore subtlety is real and worth the regression guard in the unit test.

What I verified

  • The "function-object mark is shared across subclasses" mechanism is correct for every opted-out file in this PR: the only shared bases inherited cross-module (SchemaRegistryEndpoints, AuditLogTestBase) define no test methods, and the bases that do hold tests (SchemaRegistryTestMethods, the SR *TestBase classes) are subclassed only in-module. So the non-opted-out consumers I checked — scale_tests/schema_registry_scale_test.py, admin_api_auth_test.py, cluster_linking_* — inherit no marked methods and keep running in CDT. Good.
  • skip_file_in_cdt only touching __module__-matching classes, and only their directly-defined marked methods, is correct; MaintenanceTestBase (no own tests) etc. don't trigger spurious warnings.
  • _skip_in_cdt_reason is documentation-only; the unit test pins that it is not passed to ignore.

Main points (both inline)

  1. Asymmetric cross-module guard (mode_checks.py) — the code warns when a class here inherits tests from elsewhere, but there is no signal for the reverse: a base marked here later subclassed by a non-opted-out suite in another module would silently vanish from CDT. Safe today only because the marked-and-shared bases happen to have no external subclasses; it's a load-bearing invariant that isn't enforced. Worth a follow-up guard or at least an explicit note at the call sites.
  2. Transport-stress tests in the SR opt-out — a whole-file opt-out also removes SchemaRegistryTransportStressTest (concurrent SR I/O under _schemas leadership transfer), which is plausibly a case where real-hardware network/timing does add signal. Please confirm that's intentional, or narrow to method-scope @skip_in_cdt.

Neither is blocking — no correctness bug in the current diff, no security/perf concerns. The two items are about protecting the invariant against future edits and confirming the coverage tradeoff.

Comment on lines +321 to +323
Because that mark lives on the function object itself, a subclass of the same
base in ANOTHER module would also be skipped in CDT (safe as long as opted-out
bases share no test methods with non-opted-out suites). A class here that

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring correctly notes this footgun, but there is an asymmetry worth calling out because it is a genuine silent-failure risk for future work:

  • skip_file_in_cdt warns when a class in this module inherits test methods from another module (_cross_module_cdt_test_methods).
  • There is no guard for the reverse: a base defined here whose methods get marked, then subclassed by a non-opted-out suite in another module. Those inherited tests will silently vanish from CDT with no warning anywhere, because the IGNORE mark rides on the shared function object.

Today this is safe only by accident — SchemaRegistryEndpoints/AuditLogTestBase define no test methods, so the only marked bases with tests (SchemaRegistryTestMethods, etc.) are subclassed exclusively in-module. But if someone later writes class Foo(SchemaRegistryTestMethods) in a new file (e.g. a scale/CDT-only suite), it will be silently skipped in CDT with zero signal.

That reverse direction is fundamentally undetectable at this module's import time, which is exactly why it's dangerous. Consider a follow-up guard that doesn't depend on function-object marks — e.g. an opt-out registry checked by the ducktape loader, or asserting at collection that no marked base is subclassed by an unmarked class. At minimum, please make the "share no test methods with non-opted-out suites" invariant explicit at the call sites, since it's load-bearing.

# coverage (the dockerized CI run already exercises it end to end). The test
# methods live on the base classes above; skip_file_in_cdt marks each module
# class's own methods, so the thin concrete subclasses inherit the mark.
skip_file_in_cdt(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A module-scope opt-out also pulls in SchemaRegistryTransportStressTest (and its *RpcTransport*/*KafkaClientTransport* subclasses), which does concurrent SR read/write while transferring leadership of _schemas to check resilience under real network/timing. That is arguably one of the SR tests where real EC2 (vs. loopback docker) network + scheduling behavior does add signal, unlike the pure HTTP-API surface the PR body describes.

Is opting the transport-stress tests out of CDT intentional? If so, a one-line note in the accounting justifying it would help; if not, consider a method-scope @skip_in_cdt on the HTTP-API methods instead of a whole-file opt-out, so the resilience tests keep their real-hardware coverage.

@oleiman oleiman marked this pull request as draft July 3, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-review Adding this label to a PR will trigger a workflow to review the code using claude.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants