feat(container-loader): add captureFullContainerState free function#27220
feat(container-loader): add captureFullContainerState free function#27220markfields wants to merge 21 commits intomicrosoft:mainfrom
Conversation
Adds a driver-only free function that captures a container's current state in the IPendingContainerState wire format using only an IDocumentServiceFactory and IUrlResolver. Unlike Container.getPendingLocalState(), no runtime or codeLoader is instantiated: the function fetches the latest snapshot, reads the authoritative sequence number from the snapshot's attributes blob, drains ops from delta storage from that sequence number, and serializes the result. pendingRuntimeState is undefined, so the output is intended for state relay, inspection, and durable-state snapshot use cases rather than rehydrating in-flight DDS changes. The output can be fed back into loadExistingContainer or loadFrozenContainerFromPendingState as pendingLocalState. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tainerPendingState Extends captureContainerPendingState so the driver-level state is fully self-contained for blob reads as well. Attachment blob bytes are fetched and added to snapshotBlobs keyed by storage ID, which ContainerStorageAdapter already serves through its cache — no wire-format change required. GC state is consulted when present: blobs GC has explicitly marked unreferencedTimestampMs, tombstoned, or deleted are skipped. Blobs absent from the GC graph are kept, since GC lag can leave recently-attached blobs off the graph and dropping them would lose live data. When the snapshot has no GC tree (GC disabled or pre-GC document), every attachment blob from the BlobManager redirect table is included. The relevant blob manager / GC constants and the minimal parsing logic are duplicated locally to avoid a loader → runtime dependency; comments point back to the canonical definitions in container-runtime and runtime-definitions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ware
Extends the driver-only capture to cover the whole referenced graph of the
container, not just attachment blobs.
- Honours ISnapshotTree.unreferenced: a shared tree walker skips any subtree
flagged unreferenced by the summarizer (which sets the flag from GC state)
and inlines contents of every other blob it reaches. Replaces the
unfiltered getBlobContentsFromTree path.
- Pre-fetches loading-group snapshots: enumerates groupIds on the base
snapshot (skipping unreferenced subtrees), fetches each via
IDocumentStorageService.getSnapshot({ versionId, loadingGroupIds }), runs
the fetched snapshot through the same tree walker, and serialises the
result into IPendingContainerState.loadedGroupIdSnapshots. If the driver
lacks getSnapshot support or no groupIds are declared, no groups are
included.
- GC parsing is done once and shared between tree-level and attachment-blob
filtering. captureReferencedAttachmentBlobs now takes pre-parsed GC data.
Renames captureAttachmentBlobs.ts to captureReferencedContents.ts to reflect
the broader scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the reachability filtering and groupId-fetch paths that the local end-to-end tests can't easily exercise (no summarizer runs in the local test server, and TestFluidObjectFactory doesn't produce loading-group datastores out of the box). Tests construct ISnapshotTree fixtures directly and back readBlob/getSnapshot with an in-memory shim. 15 cases across readReferencedSnapshotBlobs, parseGcSnapshotData, captureReferencedAttachmentBlobs, and captureGroupIdSnapshots — covering unreferenced subtree skip, root .blobs special-casing, ISnapshot vs ISnapshotTree input, GC lag tolerance (blobs absent from gcNodes are kept), tombstone/deletedNodes skip, and groupId enumeration/dedup/fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptureFullContainerState The function captures the whole referenced graph of the container (snapshot, loading-group snapshots, inlined structural blobs, inlined attachment blobs, trailing ops) — not just "pending state." The new name matches the scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e.spec Collapse a multi-line chained .get() call onto a single line to satisfy biome's formatter — CI was failing on `biome check .` in local-server-tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inerState Four issues flagged by Copilot review on PR microsoft#27100: 1. captureFullContainerState created an IDocumentService but never called dispose(). Wrap the capture body in try/finally and dispose in the finally to release driver-held resources (sockets/caches). 2. readReferencedSnapshotBlobs fanned every blob read at every tree level into a single Promise.all, giving unbounded concurrency on large snapshots. Refactor into a collect-then-fetch pipeline: walk the tree synchronously to gather referenced blob ids, then fetch via a new mapWithConcurrency helper capped at 32 in-flight reads. 3. captureReferencedAttachmentBlobs had the same unbounded-parallel issue over all referenced attachment storage ids. Route through the same mapWithConcurrency helper. 4. collectUnreferencedBlobLocalIds returned undefined when gcData.gcState was undefined, silently dropping tombstones/deletedNodes filtering even when those lists were populated. Contradicted the function docs. Now always applies tombstones/deletedNodes regardless of gcState presence, and returns a (possibly empty) Set rather than undefined. Added a unit test covering the gcState-undefined-but-tombstones-present case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 8a748cc: captureGroupIdSnapshots still fanned every getSnapshot call into a single Promise.all. Route through mapWithConcurrency with a lower limit (4) since each call pulls a whole snapshot tree, not a single blob. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ip cache Two issues from PR review: 1. Bind getSnapshot when extracting it from the storage service. Real driver implementations reference `this` (e.g., LocalDocumentStorageService.getSnapshot reads this.id), so calling the detached method would TypeError in strict mode. Mirrors the bind pattern in protocolTreeDocumentStorageService.ts:31. Added a class-based unit test stub whose getSnapshot touches `this` — would have caught this. 2. Pass cacheSnapshot: false on every getSnapshot call we make from the capture path. This capture is transient; we don't want to pollute the driver's snapshot cache with it. Covered by a unit test asserting the option is forwarded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (1733 lines, 15 files), I've queued these reviewers:
How this works
|
- Wire-format consts POJO + contract test - GC-interesting test - Monitoring context wired (to be reverted) - API report regenerated
There was a problem hiding this comment.
Pull request overview
This PR adds a new driver-only capture API to @fluidframework/container-loader that can produce a portable IPendingContainerState JSON for an attached document without creating a Loader/Container/Runtime, enabling fully-offline frozen-container rehydration scenarios.
Changes:
- Adds
captureFullContainerState(@legacy @alpha) to capture the latest snapshot + post-snapshot ops, inline referenced snapshot blobs, and inline referenced attachment blob bytes (base64) for portability. - Extends the pending-state wire format with
attachmentBlobContents(base64) and wires decoding/deduping through load (SerializedStateManager) and storage (PendingLocalStateStore). - Adds unit + local-server integration coverage, plus a contract test to detect drift in duplicated wire-format constants.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/loader/container-loader/src/createAndLoadContainerUtils.ts | Implements captureFullContainerState and its props interface; snapshot/op capture and assembly of IPendingContainerState. |
| packages/loader/container-loader/src/captureReferencedContents.ts | New GC-aware snapshot walker + attachment-blob capture helpers and exported wireFormatConstants. |
| packages/loader/container-loader/src/serializedStateManager.ts | Adds attachmentBlobContents to IPendingContainerState and decodes/merges it into the blob cache on load. |
| packages/loader/container-loader/src/pendingLocalStateStore.ts | Dedupes the new attachmentBlobContents map across stored pending states. |
| packages/loader/container-loader/src/containerStorageAdapter.ts | Introduces IBase64BlobContents type to make the base64-vs-utf8 encoding contract explicit. |
| packages/loader/container-loader/src/index.ts | Exports captureFullContainerState, ICaptureFullContainerStateProps, and wireFormatConstants (internal). |
| packages/loader/container-loader/src/test/captureReferencedContents.spec.ts | Unit tests for GC parsing/filtering, referenced-blob walking, attachment-blob behavior, and loading-group detection. |
| packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts | Local-server integration tests for capture → frozen rehydrate, ops after snapshot, nested DDS handles, and binary attachment blobs. |
| packages/test/local-server-tests/src/test/wireFormatConstants.spec.ts | Contract test ensuring loader-duplicated wire-format constants match runtime/runtime-definitions sources. |
| packages/runtime/container-runtime/src/index.ts | Re-exports internal blob-manager wire-format constants for the contract test. |
| packages/runtime/container-runtime/src/blobManager/blobManager.ts | Marks blobManagerBasePath as @internal for extraction/export hygiene. |
| packages/runtime/container-runtime/src/blobManager/blobManagerSnapSum.ts | Marks redirectTableBlobName as @internal for extraction/export hygiene. |
| packages/loader/container-loader/api-report/container-loader.legacy.alpha.api.md | API report update for the new @legacy @alpha export and props interface. |
| .changeset/wide-foxes-behave.md | Changeset for the new API and related internal exports. |
| full-container-state-review-notes.md | Adds detailed review notes / design and coverage tracking document. |
| const version = versions[0]; | ||
| const snapshot: ISnapshot | ISnapshotTree | undefined = | ||
| storage.getSnapshot === undefined | ||
| ? ((await storage.getSnapshotTree(version)) ?? undefined) |
| if (resolvedUrl === undefined) { | ||
| throw new UsageError("Failed to resolve request to a Fluid url"); | ||
| } |
| // Round-trip: the frozen container reads the blob through the cached | ||
| // snapshotBlobs entry, confirming the inlined copy is used on load. |
|
|
||
| /** | ||
| * Returns true if any referenced subtree of `baseSnapshot` declares a | ||
| * `loadingGroupId`. Subtrees flagged `unreferenced` are skipped — a dead |
| * Ideally these never change, if they do great care will be needed | ||
| * to preserve the correctness of the container-loader code that uses them. | ||
| */ | ||
| describe("wireFormatConstants contract", () => { |
There was a problem hiding this comment.
this seems ok. any strong reason not to push them to like driver definitions? i think that where the other snapshot format keys and interfaces live
| false, | ||
| "captureFullContainerState", | ||
| ); | ||
| const savedOps: ISequencedDocumentMessage[] = []; |
There was a problem hiding this comment.
Deep Review: Post-snapshot blobAttach blobs are not inlined into the captured artifact — offline load will fail to resolve those handles (Tier 2, correctness).
captureFullContainerState populates attachmentBlobContents only via captureReferencedAttachmentBlobs(baseSnapshot, storage, gcData) (captureReferencedContents.ts:235-258), which walks baseSnapshot.trees[".blobs"] and the in-snapshot redirect table. It then drains ops via fetchMessages(attributes.sequenceNumber + 1, …) into savedOps. A blobAttach op carries only metadata.{ localId, blobId } (containerRuntime.ts:2014-2022); replay only rebuilds the redirect table (blobManager.ts:750-775), it never fetches blob bytes. The load-side cache in containerStorageAdapter.ts:241-247 resolves through attachmentBlobContents / snapshotBlobs before falling back to live storage.
Net: a blob uploaded after baseSnapshot but before capture appears in savedOps but its bytes never enter the portable artifact. In a frozen-load scenario without live storage (the artifact's stated purpose), the handle is unresolvable.
The new spec files do not cover this path — there is no test that uploads a blob after the base snapshot, captures, and round-trips through frozen load. The gap is unmonitored in CI.
Suggested fix. While draining savedOps, detect blobAttach messages and inline their metadata.blobId contents into attachmentBlobContents if not already present (respecting the same GC filter applied to base-snapshot blobs).
Suggested test. In local-server-tests/src/test/captureFullContainerState.spec.ts, add a case that (1) attaches a container, (2) takes a base snapshot, (3) uploads an attachment blob, (4) calls captureFullContainerState, (5) loads via loadFrozenContainerFromPendingState with no live storage, and (6) asserts the handle resolves to the original bytes.
| * it does not inline attachment blob contents. | ||
| * | ||
| * On load, entries are decoded from base64 and merged into the same | ||
| * blob cache that `snapshotBlobs` populates. |
There was a problem hiding this comment.
Deep Review: IPendingContainerState serialized wire-format extended without gatekeeper sign-off or forward-compat JSDoc (Tier 2, process + compat).
This PR adds optional attachmentBlobContents?: IBase64BlobContents to IPendingContainerState, reads it here (:281-296), writes it in pendingLocalStateStore.ts:93-118. No schema-version bump and no forward/backward-compat JSDoc note on the field itself.
anthony-murphy on PR #20504 (2024-04-08) established the standing rule: "modifying our serialized format is a big deal, and generally should have a full design review, so finding other ways to do things is generally preferred." Reinforced on PR #20198: "this is serialized state, so we shouldn't change it"; "any existing usage of the serialized data will break."
The blast radius is narrowed by the field being optional and the producing API being @alpha @legacy, but the core compat behavior is non-trivial: an old loader receiving new-producer state silently drops attachmentBlobContents, making attachment blobs unreachable in offline / frozen-load scenarios — the artifact's only purpose.
Suggested fix.
- Tag anthony-murphy and dannimad explicitly on the format extension when leaving draft.
- Add a JSDoc note on
IPendingContainerState.attachmentBlobContentsthat an old loader silently ignores this field and will fail to read attachment blobs in offline / frozen-load scenarios — and that's why the producing API is@alpha. - Update the changeset (
wide-foxes-behave.md) to call out the format extension itself, not only the encoding fix.
|
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output |
Deep ReviewReviewed commit Readiness: 4/10 — 🔨 MAKING PROGRESS Not ready for sign-off. The prior round's UTF-8 wire-format defect is resolved via the new Path to Ready
Context for Reviewers
For human reviewer
Review history (3 prior reviews)
|
Adds
captureFullContainerState, a@legacy @alphafree function incontainer-loaderthatproduces an
IPendingContainerStateJSON for an attached document without instantiating aLoader, Container, or runtime. It drives only
IUrlResolver+IDocumentServiceFactory,fetches the latest snapshot via
getSnapshot/getSnapshotTree, drains ops viafetchMessages(seq+1, …), and inlines blob contents and loading-group snapshots so theartifact is fully portable.
This is the missing piece in the frozen-container series:
asLegacyAlpha/ContainerAlphasurface)loadFrozenContainerFromPendingState)createFrozenDocumentServiceFactory)Together with
loadFrozenContainerFromPendingState,captureFullContainerStateenablesfully-offline frozen-container scenarios.
Design notes
unreferenced: truesubtrees, and theattachment-blob filter parses GC tombstones / deleted-nodes with documented GC-lag
tolerance (blobs absent from the GC graph are kept).
mapWithConcurrency: 32 blobs, 4 group snapshots).captureFullContainerStatethrows
UsageErrorif any referenced subtree carries agroupId. This will be reintroduced whenthere's a known consumer and e2e harness.
container-runtime. Across-package contract test fails CI on drift.
blobs since those are text-based payloads.
See also #27100.