Skip to content

Snapshots support multi-project #130000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Jun 25, 2025

This PR makes snapshot service code and APIs multi-project compatible.

Resolves: ES-10225
Resolves: ES-10226

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jun 25, 2025
@ywangd ywangd added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jun 26, 2025
@ywangd ywangd requested review from pxsalehi and DaveCTurner June 26, 2025 04:39
@ywangd ywangd marked this pull request as ready for review June 26, 2025 04:40
@ywangd ywangd requested a review from a team as a code owner June 26, 2025 04:40
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jun 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@ywangd
Copy link
Member Author

ywangd commented Jun 26, 2025

Most of the changes are for SnapshotsService and BlobStoreRepository. The rest is mostly cascading. Initially I wanted to have separate PRs for clone, cleanup and maybe get-status APIs. But it does not seem to make much sense since there is a large code overlap and reuse among all APIs, e.g. it is actually better and easier to reason with to make the entirety of SnapshotsService project-aware. Restore is still left out. It is handled by different classes and will be addressed seperately.

* @param projectId The project that the repository belongs to
* @param name Name of the repository
*/
public record ProjectRepo(ProjectId projectId, String name) implements Writeable {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing class extracted from RepositoryOperation.

Comment on lines +1828 to +1835
final var projectMetadata = clusterMetadata.getProject(getProjectId());
executor.execute(ActionRunnable.run(allMetaListeners.acquire(), () -> {
if (finalizeSnapshotContext.serializeProjectMetadata()) {
PROJECT_METADATA_FORMAT.write(projectMetadata, blobContainer(), snapshotId.getUUID(), compress);
} else {
GLOBAL_METADATA_FORMAT.write(clusterMetadata, blobContainer(), snapshotId.getUUID(), compress);
}
}));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we conditionally write ProjectMetadata for multi-project snapshots. The Metadata in this case is a thin wrapper around ProjectMetadata to reuse existing finalization related classes.

Comment on lines +3855 to +3876
private void startExecutableClones(SnapshotsInProgress snapshotsInProgress) {
for (List<SnapshotsInProgress.Entry> entries : snapshotsInProgress.entriesByRepo()) {
startExecutableClones(entries);
}
}

/**
* Maybe kick off new shard clone operations for all repositories of the specified project
*/
private void startExecutableClones(SnapshotsInProgress snapshotsInProgress, ProjectId projectId) {
for (List<SnapshotsInProgress.Entry> entries : snapshotsInProgress.entriesByRepo(projectId)) {
startExecutableClones(entries);
}
}

/**
* Maybe kick off new shard clone operations for the single specified project repository
*/
private void startExecutableClones(SnapshotsInProgress snapshotsInProgress, ProjectRepo projectRepo) {
startExecutableClones(snapshotsInProgress.forRepo(Objects.requireNonNull(projectRepo)));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshotting is state machine that triggers next operation when the current operation finishes. In most cases, the triggering is confined in the same repository. This is the simplest case and gets migrated as is. The other case is triggering across all repositories. In a MP setup, this could mean either across all repositories of all projects or across all repositories of a single project. This is the reason for the 3 variants of the same named method here. The principles that I have applied are:

  1. If the scope was a single repository, keep it as is.
  2. If the scope was all repositories and reacting to cluster state changes, i.e. applyClusterState, it applies to all repositories across all projects.
  3. If the scope was all repository and happening after completing a particular snapshot operation, e.g. deleting a snapshot entry, it applies to all repositories of a single project that the operation is associated with.

Comment on lines +1838 to +1841
private static Tuple<ClusterState, List<SnapshotDeletionsInProgress.Entry>> readyDeletions(
ClusterState currentState,
@Nullable ProjectId projectId
) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example of different scopes for triggering next operation. In this case, it does not have the single repository scope, but either cluster wide or single project (when projectId == null).

@ywangd
Copy link
Member Author

ywangd commented Jun 27, 2025

Depending on how we handle project soft-deletion and/or clean-up, snapshots may see a project getting concurrently deleted and thereore fail. This PR does not attempt to handle it more gracefully since it would become noise and get burried in the large amount of namespacing changes. I can raise a separate ticket to track this work.

@pxsalehi
Copy link
Member

Depending on how we handle project soft-deletion and/or clean-up, snapshots may see a project getting concurrently deleted and thereore fail. This PR does not attempt to handle it more gracefully since it would become noise and get burried in the large amount of namespacing changes. I can raise a separate ticket to track this work.

yeah, we'd need a new ticket for this under the soft-deletion epic. We briefly mentioned in the design doc that once the project is marked for deletion, we should 1) prevent any new snapshots being scheduled/requested. This partially goes back to making those internal actions aware of checking for the deletion project block. 2) any ongoing snapshotting should be cancelled for that project (I guess not that simple but somehow at least fail graciously and not blow up). For 1, we have ES-12121. But I don't think 2 has a ticket yet.

Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. To my eyes these all look like straight-forward changes to trickle down project ID everywhere necessary. I don't have a strong opinions about the details. Although considering snapshotting code is convoluted and in a delicate state, I'm gonna defer the final approval to David (or anyone else with more snapshotting experience).

Comment on lines +2092 to +2094
if (token == null) {
token = parser.nextToken();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this all about?

@@ -2014,7 +2028,7 @@ private void getOneSnapshotInfo(BlockingQueue<SnapshotId> queue, GetSnapshotInfo
Exception failure = null;
SnapshotInfo snapshotInfo = null;
try {
snapshotInfo = SNAPSHOT_FORMAT.read(metadata.name(), blobContainer(), snapshotId.getUUID(), namedXContentRegistry);
snapshotInfo = SNAPSHOT_FORMAT.read(getProjectRepo(), blobContainer(), snapshotId.getUUID(), namedXContentRegistry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is getProjectRepo() now for non-MP always uding DEFAULT as project name here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants