tests: scale coverage for shadow-link role sync#30978
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens controller snapshotting of RBAC role data to avoid reactor stalls and quadratic behavior when role membership is large, and adds an end-to-end scale test to exercise the shadow-link roles migrator across multiple high-cardinality shapes.
Changes:
- Add
security::role_store::all_roles_with_members()to build a complete roles+members view in one pass over the member store, with periodic cooperative yields. - Update
cluster::security_manager::fill_snapshot()to snapshot roles via the new single-pass/yielding path. - Add
ShadowLinkRoleSyncScaleTestducktape coverage that drives create/update/delete lifecycle at scale across three parametrized role/member distributions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/rptest/tests/shadow_link_role_sync_test.py | Adds a parametrized scale characterization test for shadow-link role sync across multiple cardinality profiles. |
| src/v/security/role.cc | Implements role_store::all_roles_with_members() as a yielding, single-pass roles+members enumerator. |
| src/v/security/role_store.h | Declares the new async/yielding enumeration API with usage constraints documented. |
| src/v/cluster/security_manager.cc | Switches snapshot role enumeration to all_roles_with_members() and yields during snapshot construction. |
| // IMPORTANT: intended solely for security_manager::fill_snapshot. This | ||
| // suspends mid-iteration, which is safe ONLY while the caller holds | ||
| // mux_state_machine's _apply_mtx: that mutex serializes against command | ||
| // application, so _roles/_members_store can't mutate across a yield. A | ||
| // caller without that lock risks iterating a container that changes | ||
| // underfoot. | ||
| // | ||
| // Like roles_with_members, but enumerates every role with its members in a | ||
| // single pass and yields periodically, so snapshotting a very large role | ||
| // store doesn't stall the reactor. | ||
| ss::future<chunked_vector<role_with_members>> | ||
| all_roles_with_members() const; |
There was a problem hiding this comment.
@dotnwat picking up your question from #30946 (#30946 (comment)) about what keeps the loops in all_roles_with_members safe against concurrent modification now that they can suspend.
The only intended caller is security_manager::fill_snapshot, which should only run while the controller apply mutex is held. Looks like applying commands takes that same mutex, so command application and snapshotting should be mutually exclusive and the role store shouldn't mutate between yields.
It's a little loose imo, but this seems to be the existing convention for fill_snapshot. The other calls next to it also iterate the live stores for credentials and ACLs and yield the same way - and it looks like they're subtly relying on this apply mutex as well. I've tried to shape the comment here to spell out the apply-mutex precondition and note that it's meant solely for fill_snapshot, but not sure if that's solid enough or if fill_snapshot as a whole should be revisited.
| // Ephemeral credentials must not be stored in the snapshot. | ||
| auto creds = _credentials.local().range( | ||
| security::credential_store::is_not_ephemeral); | ||
| for (const auto& cred : creds) { | ||
| ss::visit(cred.second, [&](security::scram_credential scram) { | ||
| snapshot.user_credentials.push_back( | ||
| user_and_credential{ | ||
| security::credential_user{cred.first}, std::move(scram)}); | ||
| }); | ||
| co_await ss::coroutine::maybe_yield(); | ||
| } | ||
|
|
||
| snapshot.acls = co_await _authorizer.local().all_bindings(); |
There was a problem hiding this comment.
@dotnwat these are what I was referring to in my other comment.
Retry command for Build#86563please wait until all jobs are finished before running the slash command |
beca13a to
6326d14
Compare
| // IMPORTANT: intended solely for security_manager::fill_snapshot. This | ||
| // suspends mid-iteration, which is safe ONLY while the caller holds | ||
| // mux_state_machine's _apply_mtx: that mutex serializes against command | ||
| // application, so _roles/_members_store can't mutate across a yield. A | ||
| // caller without that lock risks iterating a container that changes | ||
| // underfoot. | ||
| // | ||
| // Like roles_with_members, but enumerates every role with its members in a | ||
| // single pass and yields periodically, so snapshotting a very large role | ||
| // store doesn't stall the reactor. | ||
| ss::future<chunked_vector<role_with_members>> | ||
| all_roles_with_members() const; |
Retry command for Build#86619please wait until all jobs are finished before running the slash command |
fill_snapshot enumerated roles by calling role_store::get() once per role, and get() rescans the entire member store on every call, so building the snapshot was O(roles * members). At thousands of roles this monopolizes the controller reactor long enough to starve Raft heartbeats and destabilize controller leadership. Switch it to all_roles_with_members(), a new async sibling of roles_with_members() that builds every role's members in one pass over the member store and yields periodically. roles_with_members() stays synchronous for the read paths, which run without the controller apply mutex; yielding mid-iteration is safe only in the snapshot path, which holds that mutex, so no command mutates the store across the yields. The stall surfaced under the role-sync scale test added in the next commit, which drives thousands of roles through a full create, update, and delete lifecycle.
Add ShadowLinkRoleSyncScaleTest, which drives the roles migrator through a full create, update, and delete lifecycle at volume. Three parametrized points each stress one cost axis at a constant 5000 total members: many roles (5000 x 1), wide membership (50 x 100), and a large single member set (5 x 1000). Each phase asserts the destination mirrors the source, and the task settles ACTIVE at the end. Source mutations retry the transient not_leader/timeout errors that thousands of rapid role writes can provoke, and the controller-log guard is raised to admit the ~3*num_roles records the lifecycle writes.
Seed a role on the source, enable role_sync_options via rpk, and assert the role and its member land on the shadow cluster. Depends on the rpk role-sync config support cherry-picked above.
6326d14 to
58ec891
Compare
Part of ENG-898 (shadowing roles). Adds scale coverage for the role-sync
migrator (CORE-16456), exercising sync against a large role set well beyond
current fleet usage (which tops out around ~200 roles per cluster).
tests:ShadowLinkRoleSyncScaleTestdrives the migrator through afull create/update/delete lifecycle at 5000 total members across three
cost axes -- many roles (5000x1), wide membership (50x100), and a large
per-role member set (5x1000) -- asserting the destination mirrors the
source at each phase and the task settles
ACTIVE.Drive-by fix surfaced by the scale test:
security_manager::fill_snapshotpreviously had a per-role
get()loop that rescanned the member storeon every call (quadratic) and could stall the controller reactor under large
role counts. Added
role_store::all_roles_with_members()to allow it tosnapshot the roles in a single pass over the member store and yields
periodically. This is preventative hardening rather than a reported bug -
the stall only bites well above today's fleet usage.
Fixes CORE-16768
Backports Required
Release Notes