feat(container-loader): add captureFullContainerState free function#27220
feat(container-loader): add captureFullContainerState free function#27220markfields merged 29 commits intomicrosoft:mainfrom
Conversation
Adds a driver-only free function that captures a container's current state in the IPendingContainerState wire format using only an IDocumentServiceFactory and IUrlResolver. Unlike Container.getPendingLocalState(), no runtime or codeLoader is instantiated: the function fetches the latest snapshot, reads the authoritative sequence number from the snapshot's attributes blob, drains ops from delta storage from that sequence number, and serializes the result. pendingRuntimeState is undefined, so the output is intended for state relay, inspection, and durable-state snapshot use cases rather than rehydrating in-flight DDS changes. The output can be fed back into loadExistingContainer or loadFrozenContainerFromPendingState as pendingLocalState. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tainerPendingState Extends captureContainerPendingState so the driver-level state is fully self-contained for blob reads as well. Attachment blob bytes are fetched and added to snapshotBlobs keyed by storage ID, which ContainerStorageAdapter already serves through its cache — no wire-format change required. GC state is consulted when present: blobs GC has explicitly marked unreferencedTimestampMs, tombstoned, or deleted are skipped. Blobs absent from the GC graph are kept, since GC lag can leave recently-attached blobs off the graph and dropping them would lose live data. When the snapshot has no GC tree (GC disabled or pre-GC document), every attachment blob from the BlobManager redirect table is included. The relevant blob manager / GC constants and the minimal parsing logic are duplicated locally to avoid a loader → runtime dependency; comments point back to the canonical definitions in container-runtime and runtime-definitions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ware
Extends the driver-only capture to cover the whole referenced graph of the
container, not just attachment blobs.
- Honours ISnapshotTree.unreferenced: a shared tree walker skips any subtree
flagged unreferenced by the summarizer (which sets the flag from GC state)
and inlines contents of every other blob it reaches. Replaces the
unfiltered getBlobContentsFromTree path.
- Pre-fetches loading-group snapshots: enumerates groupIds on the base
snapshot (skipping unreferenced subtrees), fetches each via
IDocumentStorageService.getSnapshot({ versionId, loadingGroupIds }), runs
the fetched snapshot through the same tree walker, and serialises the
result into IPendingContainerState.loadedGroupIdSnapshots. If the driver
lacks getSnapshot support or no groupIds are declared, no groups are
included.
- GC parsing is done once and shared between tree-level and attachment-blob
filtering. captureReferencedAttachmentBlobs now takes pre-parsed GC data.
Renames captureAttachmentBlobs.ts to captureReferencedContents.ts to reflect
the broader scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the reachability filtering and groupId-fetch paths that the local end-to-end tests can't easily exercise (no summarizer runs in the local test server, and TestFluidObjectFactory doesn't produce loading-group datastores out of the box). Tests construct ISnapshotTree fixtures directly and back readBlob/getSnapshot with an in-memory shim. 15 cases across readReferencedSnapshotBlobs, parseGcSnapshotData, captureReferencedAttachmentBlobs, and captureGroupIdSnapshots — covering unreferenced subtree skip, root .blobs special-casing, ISnapshot vs ISnapshotTree input, GC lag tolerance (blobs absent from gcNodes are kept), tombstone/deletedNodes skip, and groupId enumeration/dedup/fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptureFullContainerState The function captures the whole referenced graph of the container (snapshot, loading-group snapshots, inlined structural blobs, inlined attachment blobs, trailing ops) — not just "pending state." The new name matches the scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e.spec Collapse a multi-line chained .get() call onto a single line to satisfy biome's formatter — CI was failing on `biome check .` in local-server-tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inerState Four issues flagged by Copilot review on PR microsoft#27100: 1. captureFullContainerState created an IDocumentService but never called dispose(). Wrap the capture body in try/finally and dispose in the finally to release driver-held resources (sockets/caches). 2. readReferencedSnapshotBlobs fanned every blob read at every tree level into a single Promise.all, giving unbounded concurrency on large snapshots. Refactor into a collect-then-fetch pipeline: walk the tree synchronously to gather referenced blob ids, then fetch via a new mapWithConcurrency helper capped at 32 in-flight reads. 3. captureReferencedAttachmentBlobs had the same unbounded-parallel issue over all referenced attachment storage ids. Route through the same mapWithConcurrency helper. 4. collectUnreferencedBlobLocalIds returned undefined when gcData.gcState was undefined, silently dropping tombstones/deletedNodes filtering even when those lists were populated. Contradicted the function docs. Now always applies tombstones/deletedNodes regardless of gcState presence, and returns a (possibly empty) Set rather than undefined. Added a unit test covering the gcState-undefined-but-tombstones-present case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 8a748cc: captureGroupIdSnapshots still fanned every getSnapshot call into a single Promise.all. Route through mapWithConcurrency with a lower limit (4) since each call pulls a whole snapshot tree, not a single blob. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ip cache Two issues from PR review: 1. Bind getSnapshot when extracting it from the storage service. Real driver implementations reference `this` (e.g., LocalDocumentStorageService.getSnapshot reads this.id), so calling the detached method would TypeError in strict mode. Mirrors the bind pattern in protocolTreeDocumentStorageService.ts:31. Added a class-based unit test stub whose getSnapshot touches `this` — would have caught this. 2. Pass cacheSnapshot: false on every getSnapshot call we make from the capture path. This capture is transient; we don't want to pollute the driver's snapshot cache with it. Covered by a unit test asserting the option is forwarded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (2014 lines, 13 files), I've queued these reviewers:
How this works
|
- Wire-format consts POJO + contract test - GC-interesting test - Monitoring context wired (to be reverted) - API report regenerated
There was a problem hiding this comment.
Pull request overview
This PR adds a new driver-only capture API to @fluidframework/container-loader that can produce a portable IPendingContainerState JSON for an attached document without creating a Loader/Container/Runtime, enabling fully-offline frozen-container rehydration scenarios.
Changes:
- Adds
captureFullContainerState(@legacy @alpha) to capture the latest snapshot + post-snapshot ops, inline referenced snapshot blobs, and inline referenced attachment blob bytes (base64) for portability. - Extends the pending-state wire format with
attachmentBlobContents(base64) and wires decoding/deduping through load (SerializedStateManager) and storage (PendingLocalStateStore). - Adds unit + local-server integration coverage, plus a contract test to detect drift in duplicated wire-format constants.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/loader/container-loader/src/createAndLoadContainerUtils.ts | Implements captureFullContainerState and its props interface; snapshot/op capture and assembly of IPendingContainerState. |
| packages/loader/container-loader/src/captureReferencedContents.ts | New GC-aware snapshot walker + attachment-blob capture helpers and exported wireFormatConstants. |
| packages/loader/container-loader/src/serializedStateManager.ts | Adds attachmentBlobContents to IPendingContainerState and decodes/merges it into the blob cache on load. |
| packages/loader/container-loader/src/pendingLocalStateStore.ts | Dedupes the new attachmentBlobContents map across stored pending states. |
| packages/loader/container-loader/src/containerStorageAdapter.ts | Introduces IBase64BlobContents type to make the base64-vs-utf8 encoding contract explicit. |
| packages/loader/container-loader/src/index.ts | Exports captureFullContainerState, ICaptureFullContainerStateProps, and wireFormatConstants (internal). |
| packages/loader/container-loader/src/test/captureReferencedContents.spec.ts | Unit tests for GC parsing/filtering, referenced-blob walking, attachment-blob behavior, and loading-group detection. |
| packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts | Local-server integration tests for capture → frozen rehydrate, ops after snapshot, nested DDS handles, and binary attachment blobs. |
| packages/test/local-server-tests/src/test/wireFormatConstants.spec.ts | Contract test ensuring loader-duplicated wire-format constants match runtime/runtime-definitions sources. |
| packages/runtime/container-runtime/src/index.ts | Re-exports internal blob-manager wire-format constants for the contract test. |
| packages/runtime/container-runtime/src/blobManager/blobManager.ts | Marks blobManagerBasePath as @internal for extraction/export hygiene. |
| packages/runtime/container-runtime/src/blobManager/blobManagerSnapSum.ts | Marks redirectTableBlobName as @internal for extraction/export hygiene. |
| packages/loader/container-loader/api-report/container-loader.legacy.alpha.api.md | API report update for the new @legacy @alpha export and props interface. |
| .changeset/wide-foxes-behave.md | Changeset for the new API and related internal exports. |
| full-container-state-review-notes.md | Adds detailed review notes / design and coverage tracking document. |
| return { tree: snapshot, read: async (id) => storage.readBlob(id) }; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Deep Review: UTF-8/base64 encoding split — process sign-off ask, no code change.
captureReferencedAttachmentBlobs writes base64 (bufferToString(buffer, "base64")) into a new IBase64BlobContents; readReferencedSnapshotBlobs writes UTF-8 into ISerializableBlobContents (serializedStateManager.ts:284-299, decode in containerStorageAdapter.ts:515-527). The split is technically correct and aligns with the runtime's existing pending-blob serializer (blobManager.ts:340-343, :951-960); the non-UTF-8 round-trip e2e test at captureFullContainerState.spec.ts:351-427 covers the binary case.
Process concern: vladsud's "we should stick to one format across layers for blobs" position from PR #20507 (with anthony-murphy explicitly looped in) hasn't been revisited on the record. vladsud is already requested; please also tag anthony-murphy so the area's "one format" stance is consciously superseded rather than silently bypassed. The binary-safety argument for the split is sound — no code change expected.
| * Optional logger for driver-side telemetry. | ||
| */ | ||
| readonly logger?: ITelemetryBaseLogger | undefined; | ||
| } |
There was a problem hiding this comment.
Deep Review: Add abortSignal?: AbortSignal to ICaptureFullContainerStateProps before @alpha graduation.
ICaptureFullContainerStateProps exposes only urlResolver, documentServiceFactory, request, logger. The capture body concurrently reads snapshot/attachment blobs (bounded concurrency 32) and drains the op stream without a cancellation check. IDocumentDeltaStorageService.fetchMessages already accepts abortSignal?: AbortSignal (packages/common/driver-definitions/src/storage.ts:103-109), so the underlying primitives support it.
Suggested fix: add abortSignal?: AbortSignal to ICaptureFullContainerStateProps, propagate it into the mapWithConcurrency blob/group loops and the fetchMessages op-drain loop, and call signal.throwIfAborted() between bounded batches. Additive and cheaper to land before the API graduates from @alpha.
There was a problem hiding this comment.
This is a good idea, adding it to follow-ups mentioned in PR description
There was a problem hiding this comment.
Deep Review: Thanks — accepting as a follow-up is reasonable given the API is still @alpha @legacy. One ask: gate the follow-up to land before graduation. The prop is additive, so it's cheaper now than once the surface is frozen; please link the ADO item here when filed and resolve the thread.
| attachmentBlobContents, | ||
| ); | ||
| Object.assign(attachmentBlobContents, added); | ||
| } |
There was a problem hiding this comment.
Deep Review: Op-stream drain trusts the driver silently — assert contiguity to fail closed.
captureFullContainerState drains connectToDeltaStorage().fetchMessages(attributes.sequenceNumber + 1, …) into savedOps and emits whatever the driver yields. Other parts of the same function fail-closed on suspicious states (UsageError on loading groups; GenericError on empty snapshot fetch); ops are the one place that trusts the driver silently. A buggy driver / network truncation would produce an artifact whose ops cannot be replayed, surfacing as a confusing rehydration-time failure pointing at the loader rather than at the capture.
Suggested fix: track a previous-sequence-number across the drain loop and assert each op's sequenceNumber === prev + 1; throw on a gap with the offending pair in the message. Optional: also assert the snapshot's latestSequenceNumber >= attributes.sequenceNumber at fetch time.
There was a problem hiding this comment.
Another good point. A downside to not using the DeltaManager code in the container loader, I guess. Tracking as a follow-up.
There was a problem hiding this comment.
Deep Review: Thanks for tracking this. The contiguity assert keeps the artifact's replay invariants honest at capture time and matches the fail-closed posture used elsewhere in captureFullContainerState (UsageError on loading groups, GenericError on empty snapshot fetch). Please link the follow-up ADO item here when filed and resolve the thread; ideally land before @alpha graduation alongside #3204969959.
| gcBlobPrefix: "__gc", | ||
| gcTombstoneBlobKey: "__tombstones", | ||
| gcDeletedBlobKey: "__deletedNodes", | ||
| } as const; |
There was a problem hiding this comment.
Deep Review: Loader↔runtime constant duplication — gatekeeper nod ask, no code change.
This PR introduces a wireFormatConstants block in the loader mirroring seven runtime-side constants (.blobs, .redirectTable, _blobs, gc, __gc, __tombstones, __deletedNodes) and a cross-package deepStrictEqual contract test (wireFormatConstants.spec.ts). PR #21729 (anthony-murphy, 2024-07-02) was the deliberate refactor that produced these constants in their current form, with markfields reviewing for path-format discipline. This PR is the first cross-layer consumer; the chosen pattern (duplicate + contract test) sets precedent for any future shared wire-format constant.
Tag anthony-murphy for an explicit nod on the duplicate+contract-test layering choice, or merge with a clear note that AB#72934 will revisit (per thread #3192306888). No code change expected.
| urlResolver, | ||
| documentServiceFactory, | ||
| request, |
There was a problem hiding this comment.
@anthony-murphy thinking about replacing these 3 params with just documentService and having caller piece that together (maybe in follow-up PR). Does that work? I don't totally understand the environment in which this will be called and where these params will come from
There was a problem hiding this comment.
its tricky. overall i would love to move in a direction where fluid takes a document service, but it could be hard right now due lack of usage that way. there won't be any app examples, and i'm not sure the odsp driver supports it. i would be 100% down to support both maybe via overload or union to start moving in that direction, as we need to start somewhere
There was a problem hiding this comment.
I'd vote for this direction. I think we can expose both - the core function that takes instance, and wrapper that takes factory. Might need to make some changes to the service interface. For examlpe, it would be nice to get resolved URI from service (assuming it's not exposed today) and thus get rid of URI resolver dependency as well
| * runtime authors) would corrupt non-UTF-8 byte sequences with | ||
| * replacement characters. Populated by `captureFullContainerState`; the | ||
| * live container's pending-state path leaves this `undefined` because | ||
| * it does not inline attachment blob contents. |
There was a problem hiding this comment.
It's probably much larger change (and should be considered on its own), but I wonder if something like directory / map should support arrays as payloads? In addition to strings. But that would likely require massive changes across whole stack.
| try { | ||
| const storage = await documentService.connectToStorage(); | ||
|
|
||
| const versions = await storage.getVersions( |
There was a problem hiding this comment.
any opportunities to dedup some code with Container.load()?
| documentServiceFactory, | ||
| request, | ||
| logger, | ||
| }: ICaptureFullContainerStateProps): Promise<string> { |
There was a problem hiding this comment.
If I got it right, this flow does not have websocket connection and fetching ops from websocket.
It might be enough for prototype, but we would need ops from socket (replace with REST call once/if we have such API from SPO in the future).
There was a problem hiding this comment.
L390 we connect to delta storage and fetch messages. I'll have to look at what that's doing... is it possible it's already that REST API we're talking about? I forget and on mobile at the moment.
There was a problem hiding this comment.
Ohhh that's Storage's deltas, not PUSH, right? So would be missing the very latest messages?
| @@ -0,0 +1,388 @@ | |||
| /*! | |||
There was a problem hiding this comment.
Do we need some kind of stress test system here? I.e. a collection of random snapshots that go through new code path vs. full container load (with container runtime instantiation) and check that they produce same result in terms of ops fetched, blobs captured?
That assumes that we have equvalent flow for full container load (which we do not), so maybe we can't imlement it directly. But maybe some parts of it can be implemented, like asking GC subsystem for referenced blobs and compare that with what container-loader layer computed.
I just worry that as we make changes in runtime, they may not reflect in this layer.
There was a problem hiding this comment.
this could be integrated into local-server-stress, which already does pending state capture from live containers
| "@fluidframework/container-runtime": minor | ||
| "__section": feature | ||
| --- | ||
| Add a new free function `captureFullContainerState` for serializing a container without loading it, for rehydrating into a future session. |
There was a problem hiding this comment.
| Add a new free function `captureFullContainerState` for serializing a container without loading it, for rehydrating into a future session. | |
| Serialize a container without loading it for rehydrating a future session with `captureFullContainerState` |
I would mention that it's a free function somewhere in the longer description as opposed to the title.
There was a problem hiding this comment.
I'm going to drop the changeset, we don't need to advertise this at the moment. We can do so when promoting to beta.
|
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output |
Deep ReviewReviewed commit Readiness: 8/10 — 🟢 ALMOST READY No Tier 1 or Tier 2 issues remain. The two newest concrete asks ( Path to Ready
Context for Reviewers
For human reviewer
Review history (6 prior reviews)
|
| return added; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Deep Review: Loading-group fail-fast — confirm follow-up tracking against a target consumer.
snapshotHasLoadingGroups(baseSnapshot) throws UsageError whenever any referenced subtree carries a groupId, and captureFullContainerState leaves loadedGroupIdSnapshots: undefined. Fail-fast is the right interim for an @alpha @legacy surface — preferable to emitting an incomplete artifact — and consistent with PR #20565's pending-state model for loading groups (tyler-cai-microsoft).
No code change requested. Two asks:
- Confirm the deferral is tracked against an open ADO/work item with a target consumer. The PR risk-section text alone ("no known consumer + no e2e harness") does not auto-prioritize; once a consumer surfaces, the per-
groupIdstorage.getSnapshot({ loadingGroupIds: [groupId], cacheSnapshot: false })path needs to follow PR [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565's storage convention rather than inventing a new one. - Tag tyler-cai-microsoft to confirm the parity gap with PR [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565 is acceptable scope for v1.
Adds
captureFullContainerState, a@legacy @alphafree function incontainer-loaderthatproduces an
IPendingContainerStateJSON for an attached document without instantiating aLoader, Container, or runtime. It drives only
IUrlResolver+IDocumentServiceFactory,fetches the latest snapshot via
getSnapshot/getSnapshotTree, drains ops viafetchMessages(seq+1, …), and inlines blob contents and loading-group snapshots so theartifact is fully portable.
This is the missing piece in the frozen-container series:
asLegacyAlpha/ContainerAlphasurface)loadFrozenContainerFromPendingState)createFrozenDocumentServiceFactory)Together with
loadFrozenContainerFromPendingState,captureFullContainerStateenablesfully-offline frozen-container scenarios.
Design notes
unreferenced: truesubtrees, and theattachment-blob filter parses GC tombstones / deleted-nodes with documented GC-lag
tolerance (blobs absent from the GC graph are kept).
mapWithConcurrency: 32 blobs, 4 group snapshots).captureFullContainerStatethrows
UsageErrorif any referenced subtree carries agroupId. This will be reintroduced whenthere's a known consumer and e2e harness.
container-runtime. Across-package contract test fails CI on drift.
blobs since those are text-based payloads.
See also #27100, and this markdown file used during branch development,
Follow-ups
See this list of follow-ups accrued during development. These will be moved to ADO tasks once the PR is merged.
Additionally, these from Deep Review on a later iteration:
And some of my own thoughts on final review pass:
loadExistingContainerandloadFrozenContainerFromPendingState?