Skip to content

feat(container-loader): add captureFullContainerState free function#27220

Open
markfields wants to merge 21 commits intomicrosoft:mainfrom
markfields:full-container-state
Open

feat(container-loader): add captureFullContainerState free function#27220
markfields wants to merge 21 commits intomicrosoft:mainfrom
markfields:full-container-state

Conversation

@markfields
Copy link
Copy Markdown
Member

@markfields markfields commented May 1, 2026

Adds captureFullContainerState, a @legacy @alpha free function in container-loader that
produces an IPendingContainerState JSON for an attached document without instantiating a
Loader, Container, or runtime
. It drives only IUrlResolver + IDocumentServiceFactory,
fetches the latest snapshot via getSnapshot/getSnapshotTree, drains ops via
fetchMessages(seq+1, …), and inlines blob contents and loading-group snapshots so the
artifact is fully portable.

This is the missing piece in the frozen-container series:

Together with loadFrozenContainerFromPendingState, captureFullContainerState enables
fully-offline frozen-container scenarios.

Design notes

  • GC respect is two-layered: snapshot tree walk skips unreferenced: true subtrees, and the
    attachment-blob filter parses GC tombstones / deleted-nodes with documented GC-lag
    tolerance (blobs absent from the GC graph are kept).
  • Concurrency is bounded (mapWithConcurrency: 32 blobs, 4 group snapshots).
  • Loading-group snapshots are intentionally NOT proactively captured — captureFullContainerState
    throws UsageError if any referenced subtree carries a groupId. This will be reintroduced when
    there's a known consumer and e2e harness.
  • Some snapshot/GC/blob-tree constants are duplicated from container-runtime. A
    cross-package contract test fails CI on drift.
  • Attachment-blob payloads are encoded via base64, which is different from existing inlined
    blobs since those are text-based payloads.

See also #27100.

anthony-murphy and others added 10 commits April 20, 2026 10:48
Adds a driver-only free function that captures a container's current state
in the IPendingContainerState wire format using only an IDocumentServiceFactory
and IUrlResolver. Unlike Container.getPendingLocalState(), no runtime or
codeLoader is instantiated: the function fetches the latest snapshot, reads
the authoritative sequence number from the snapshot's attributes blob, drains
ops from delta storage from that sequence number, and serializes the result.
pendingRuntimeState is undefined, so the output is intended for state relay,
inspection, and durable-state snapshot use cases rather than rehydrating
in-flight DDS changes. The output can be fed back into loadExistingContainer
or loadFrozenContainerFromPendingState as pendingLocalState.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tainerPendingState

Extends captureContainerPendingState so the driver-level state is fully
self-contained for blob reads as well. Attachment blob bytes are fetched and
added to snapshotBlobs keyed by storage ID, which ContainerStorageAdapter
already serves through its cache — no wire-format change required.

GC state is consulted when present: blobs GC has explicitly marked
unreferencedTimestampMs, tombstoned, or deleted are skipped. Blobs absent
from the GC graph are kept, since GC lag can leave recently-attached blobs
off the graph and dropping them would lose live data. When the snapshot has
no GC tree (GC disabled or pre-GC document), every attachment blob from the
BlobManager redirect table is included.

The relevant blob manager / GC constants and the minimal parsing logic are
duplicated locally to avoid a loader → runtime dependency; comments point
back to the canonical definitions in container-runtime and
runtime-definitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ware

Extends the driver-only capture to cover the whole referenced graph of the
container, not just attachment blobs.

- Honours ISnapshotTree.unreferenced: a shared tree walker skips any subtree
  flagged unreferenced by the summarizer (which sets the flag from GC state)
  and inlines contents of every other blob it reaches. Replaces the
  unfiltered getBlobContentsFromTree path.
- Pre-fetches loading-group snapshots: enumerates groupIds on the base
  snapshot (skipping unreferenced subtrees), fetches each via
  IDocumentStorageService.getSnapshot({ versionId, loadingGroupIds }), runs
  the fetched snapshot through the same tree walker, and serialises the
  result into IPendingContainerState.loadedGroupIdSnapshots. If the driver
  lacks getSnapshot support or no groupIds are declared, no groups are
  included.
- GC parsing is done once and shared between tree-level and attachment-blob
  filtering. captureReferencedAttachmentBlobs now takes pre-parsed GC data.

Renames captureAttachmentBlobs.ts to captureReferencedContents.ts to reflect
the broader scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the reachability filtering and groupId-fetch paths that the local
end-to-end tests can't easily exercise (no summarizer runs in the local
test server, and TestFluidObjectFactory doesn't produce loading-group
datastores out of the box). Tests construct ISnapshotTree fixtures
directly and back readBlob/getSnapshot with an in-memory shim.

15 cases across readReferencedSnapshotBlobs, parseGcSnapshotData,
captureReferencedAttachmentBlobs, and captureGroupIdSnapshots — covering
unreferenced subtree skip, root .blobs special-casing, ISnapshot vs
ISnapshotTree input, GC lag tolerance (blobs absent from gcNodes are
kept), tombstone/deletedNodes skip, and groupId enumeration/dedup/fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptureFullContainerState

The function captures the whole referenced graph of the container (snapshot,
loading-group snapshots, inlined structural blobs, inlined attachment blobs,
trailing ops) — not just "pending state." The new name matches the scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e.spec

Collapse a multi-line chained .get() call onto a single line to satisfy
biome's formatter — CI was failing on `biome check .` in local-server-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inerState

Four issues flagged by Copilot review on PR microsoft#27100:

1. captureFullContainerState created an IDocumentService but never called
   dispose(). Wrap the capture body in try/finally and dispose in the
   finally to release driver-held resources (sockets/caches).

2. readReferencedSnapshotBlobs fanned every blob read at every tree level
   into a single Promise.all, giving unbounded concurrency on large
   snapshots. Refactor into a collect-then-fetch pipeline: walk the tree
   synchronously to gather referenced blob ids, then fetch via a new
   mapWithConcurrency helper capped at 32 in-flight reads.

3. captureReferencedAttachmentBlobs had the same unbounded-parallel issue
   over all referenced attachment storage ids. Route through the same
   mapWithConcurrency helper.

4. collectUnreferencedBlobLocalIds returned undefined when gcData.gcState
   was undefined, silently dropping tombstones/deletedNodes filtering even
   when those lists were populated. Contradicted the function docs. Now
   always applies tombstones/deletedNodes regardless of gcState presence,
   and returns a (possibly empty) Set rather than undefined. Added a unit
   test covering the gcState-undefined-but-tombstones-present case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 8a748cc: captureGroupIdSnapshots still fanned every
getSnapshot call into a single Promise.all. Route through
mapWithConcurrency with a lower limit (4) since each call pulls a whole
snapshot tree, not a single blob.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ip cache

Two issues from PR review:

1. Bind getSnapshot when extracting it from the storage service. Real
   driver implementations reference `this` (e.g.,
   LocalDocumentStorageService.getSnapshot reads this.id), so calling
   the detached method would TypeError in strict mode. Mirrors the
   bind pattern in protocolTreeDocumentStorageService.ts:31. Added a
   class-based unit test stub whose getSnapshot touches `this` — would
   have caught this.

2. Pass cacheSnapshot: false on every getSnapshot call we make from the
   capture path. This capture is transient; we don't want to pollute the
   driver's snapshot cache with it. Covered by a unit test asserting the
   option is forwarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (1733 lines, 15 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
markfields added 7 commits May 5, 2026 20:33
- Wire-format consts POJO + contract test
- GC-interesting test
- Monitoring context wired (to be reverted)
- API report regenerated
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
@markfields markfields marked this pull request as ready for review May 6, 2026 00:01
Copilot AI review requested due to automatic review settings May 6, 2026 00:01
@markfields markfields requested a review from a team as a code owner May 6, 2026 00:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new driver-only capture API to @fluidframework/container-loader that can produce a portable IPendingContainerState JSON for an attached document without creating a Loader/Container/Runtime, enabling fully-offline frozen-container rehydration scenarios.

Changes:

  • Adds captureFullContainerState (@legacy @alpha) to capture the latest snapshot + post-snapshot ops, inline referenced snapshot blobs, and inline referenced attachment blob bytes (base64) for portability.
  • Extends the pending-state wire format with attachmentBlobContents (base64) and wires decoding/deduping through load (SerializedStateManager) and storage (PendingLocalStateStore).
  • Adds unit + local-server integration coverage, plus a contract test to detect drift in duplicated wire-format constants.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/loader/container-loader/src/createAndLoadContainerUtils.ts Implements captureFullContainerState and its props interface; snapshot/op capture and assembly of IPendingContainerState.
packages/loader/container-loader/src/captureReferencedContents.ts New GC-aware snapshot walker + attachment-blob capture helpers and exported wireFormatConstants.
packages/loader/container-loader/src/serializedStateManager.ts Adds attachmentBlobContents to IPendingContainerState and decodes/merges it into the blob cache on load.
packages/loader/container-loader/src/pendingLocalStateStore.ts Dedupes the new attachmentBlobContents map across stored pending states.
packages/loader/container-loader/src/containerStorageAdapter.ts Introduces IBase64BlobContents type to make the base64-vs-utf8 encoding contract explicit.
packages/loader/container-loader/src/index.ts Exports captureFullContainerState, ICaptureFullContainerStateProps, and wireFormatConstants (internal).
packages/loader/container-loader/src/test/captureReferencedContents.spec.ts Unit tests for GC parsing/filtering, referenced-blob walking, attachment-blob behavior, and loading-group detection.
packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts Local-server integration tests for capture → frozen rehydrate, ops after snapshot, nested DDS handles, and binary attachment blobs.
packages/test/local-server-tests/src/test/wireFormatConstants.spec.ts Contract test ensuring loader-duplicated wire-format constants match runtime/runtime-definitions sources.
packages/runtime/container-runtime/src/index.ts Re-exports internal blob-manager wire-format constants for the contract test.
packages/runtime/container-runtime/src/blobManager/blobManager.ts Marks blobManagerBasePath as @internal for extraction/export hygiene.
packages/runtime/container-runtime/src/blobManager/blobManagerSnapSum.ts Marks redirectTableBlobName as @internal for extraction/export hygiene.
packages/loader/container-loader/api-report/container-loader.legacy.alpha.api.md API report update for the new @legacy @alpha export and props interface.
.changeset/wide-foxes-behave.md Changeset for the new API and related internal exports.
full-container-state-review-notes.md Adds detailed review notes / design and coverage tracking document.

const version = versions[0];
const snapshot: ISnapshot | ISnapshotTree | undefined =
storage.getSnapshot === undefined
? ((await storage.getSnapshotTree(version)) ?? undefined)
Comment on lines +336 to +338
if (resolvedUrl === undefined) {
throw new UsageError("Failed to resolve request to a Fluid url");
}
Comment on lines +283 to +284
// Round-trip: the frozen container reads the blob through the cached
// snapshotBlobs entry, confirming the inlined copy is used on load.

/**
* Returns true if any referenced subtree of `baseSnapshot` declares a
* `loadingGroupId`. Subtrees flagged `unreferenced` are skipped — a dead
* Ideally these never change, if they do great care will be needed
* to preserve the correctness of the container-loader code that uses them.
*/
describe("wireFormatConstants contract", () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems ok. any strong reason not to push them to like driver definitions? i think that where the other snapshot format keys and interfaces live

false,
"captureFullContainerState",
);
const savedOps: ISequencedDocumentMessage[] = [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Post-snapshot blobAttach blobs are not inlined into the captured artifact — offline load will fail to resolve those handles (Tier 2, correctness).

captureFullContainerState populates attachmentBlobContents only via captureReferencedAttachmentBlobs(baseSnapshot, storage, gcData) (captureReferencedContents.ts:235-258), which walks baseSnapshot.trees[".blobs"] and the in-snapshot redirect table. It then drains ops via fetchMessages(attributes.sequenceNumber + 1, …) into savedOps. A blobAttach op carries only metadata.{ localId, blobId } (containerRuntime.ts:2014-2022); replay only rebuilds the redirect table (blobManager.ts:750-775), it never fetches blob bytes. The load-side cache in containerStorageAdapter.ts:241-247 resolves through attachmentBlobContents / snapshotBlobs before falling back to live storage.

Net: a blob uploaded after baseSnapshot but before capture appears in savedOps but its bytes never enter the portable artifact. In a frozen-load scenario without live storage (the artifact's stated purpose), the handle is unresolvable.

The new spec files do not cover this path — there is no test that uploads a blob after the base snapshot, captures, and round-trips through frozen load. The gap is unmonitored in CI.

Suggested fix. While draining savedOps, detect blobAttach messages and inline their metadata.blobId contents into attachmentBlobContents if not already present (respecting the same GC filter applied to base-snapshot blobs).

Suggested test. In local-server-tests/src/test/captureFullContainerState.spec.ts, add a case that (1) attaches a container, (2) takes a base snapshot, (3) uploads an attachment blob, (4) calls captureFullContainerState, (5) loads via loadFrozenContainerFromPendingState with no live storage, and (6) asserts the handle resolves to the original bytes.

* it does not inline attachment blob contents.
*
* On load, entries are decoded from base64 and merged into the same
* blob cache that `snapshotBlobs` populates.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: IPendingContainerState serialized wire-format extended without gatekeeper sign-off or forward-compat JSDoc (Tier 2, process + compat).

This PR adds optional attachmentBlobContents?: IBase64BlobContents to IPendingContainerState, reads it here (:281-296), writes it in pendingLocalStateStore.ts:93-118. No schema-version bump and no forward/backward-compat JSDoc note on the field itself.

anthony-murphy on PR #20504 (2024-04-08) established the standing rule: "modifying our serialized format is a big deal, and generally should have a full design review, so finding other ways to do things is generally preferred." Reinforced on PR #20198: "this is serialized state, so we shouldn't change it"; "any existing usage of the serialized data will break."

The blast radius is narrowed by the field being optional and the producing API being @alpha @legacy, but the core compat behavior is non-trivial: an old loader receiving new-producer state silently drops attachmentBlobContents, making attachment blobs unreachable in offline / frozen-load scenarios — the artifact's only purpose.

Suggested fix.

  1. Tag anthony-murphy and dannimad explicitly on the format extension when leaving draft.
  2. Add a JSDoc note on IPendingContainerState.attachmentBlobContents that an old loader silently ignores this field and will fail to read attachment blobs in offline / frozen-load scenarios — and that's why the producing API is @alpha.
  3. Update the changeset (wide-foxes-behave.md) to call out the format extension itself, not only the encoding fix.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  288641 links
    1922 destination URLs
    2172 URLs ignored
       0 warnings
       0 errors


@anthony-murphy
Copy link
Copy Markdown
Contributor

Deep Review

Reviewed commit 1a4bcfa on 2026-05-06.

Readiness: 4/10 — 🔨 MAKING PROGRESS

Not ready for sign-off. The prior round's UTF-8 wire-format defect is resolved via the new IBase64BlobContents split, but two concerns from the last review remain live: (1) attachment blobs uploaded after the base snapshot are preserved as blobAttach ops in savedOps and never base64-inlined into attachmentBlobContents, so a truly offline rehydrate silently fails for those handles — flagged inline on createAndLoadContainerUtils.ts:394; (2) the IPendingContainerState extension that delivered the encoding fix is itself a serialized-format change subject to the documented gatekeeper rule and still lacks a forward/backward-compat JSDoc note — flagged inline on serializedStateManager.ts:99. Tier 3 polish (constants keep-in-sync doc, stale review-notes file, loading-groups limitation in changeset, copilot-reviewer wording/diagnostic asks) is held until the Tier 2 fix lands.

Path to Ready

Context for Reviewers

For human reviewer
  • anthony-murphy — Confirm the wire-format direction for the format extension: an optional field on IPendingContainerState with a JSDoc compat note (current) vs. a versioned envelope vs. another shape. Also: should the IPendingContainerState literal be constructed in captureFullContainerState directly (current) or via a helper in serializedStateManager for shape ownership? And: should the duplicated runtime constants live in driver-definitions (per thread #3192306888) rather than be hand-duplicated?
  • dannimad — Procedurally required as the consumer-side owner via Expose creation of a frozen document service factory. #25653 createFrozenDocumentServiceFactory; confirm the captured artifact shape meets rehydrate-side expectations.
  • jatgarg — Procedural sign-off on the mixinMonitoringContext / configProvider divergence in createAndLoadContainerUtils.ts (the convention enforced on the sibling loadSummarizerContainerAndMakeSummary in PR On demand summarizer for on demand summaries #25394). The author's docblock acknowledges the divergence; confirm or require an optional configProvider? prop now for forward-compat.
  • ChumpChief — Alpha-surface scaling. The "how does this scale?" question from PR ContainerLoader: Move getPendingLocalState to legacy/alpha #25513 remains open across four alpha additions. Resolve the convention (overload of asLegacyAlpha vs. standalone free functions).
  • tyler-cai-microsoft — Confirm the loading-groups UsageError is acceptable scope and follow-up tracking captures the parity gap with [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565.
  • agarwal-navin — Confirm the GC-filter / tombstone-recovery tradeoff for the captured artifact (tombstoned blobs are foreclosed by the artifact, different from the live runtime's allowTombstone recoverability).
  • Cannot be assessed by the pipeline:
    • Real-world memory/throughput on long-lived documents with millions of trailing ops.
    • Whether mapWithConcurrency limits (32 blobs / 4 group snapshots) are correct under production driver-throttling regimes (notably ODSP).
    • Whether the captured artifact loaded by loadFrozenContainerFromPendingState reproduces identical runtime behavior across all document shapes; the local-server test exercises one round-trip only.
Review history (3 prior reviews)
  • cdda9ba 2026-05-05 · 4/10 — UTF-8 wire-format defect resolved; post-snapshot blobAttach gap and format-extension JSDoc surface
  • 1d5a51b 2026-05-05 · 3/10 — UTF-8 attachment-blob round-trip flagged Tier 1
  • 57a1a76 2026-05-04 · 5/10 — GC-redirect-identity and unbounded-savedOps flagged (both later reclassified)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants