Skip to content

feat(container-loader): add captureFullContainerState free function#27220

Merged
markfields merged 29 commits intomicrosoft:mainfrom
markfields:full-container-state
May 8, 2026
Merged

feat(container-loader): add captureFullContainerState free function#27220
markfields merged 29 commits intomicrosoft:mainfrom
markfields:full-container-state

Conversation

@markfields
Copy link
Copy Markdown
Member

@markfields markfields commented May 1, 2026

Adds captureFullContainerState, a @legacy @alpha free function in container-loader that
produces an IPendingContainerState JSON for an attached document without instantiating a
Loader, Container, or runtime
. It drives only IUrlResolver + IDocumentServiceFactory,
fetches the latest snapshot via getSnapshot/getSnapshotTree, drains ops via
fetchMessages(seq+1, …), and inlines blob contents and loading-group snapshots so the
artifact is fully portable.

This is the missing piece in the frozen-container series:

Together with loadFrozenContainerFromPendingState, captureFullContainerState enables
fully-offline frozen-container scenarios.

Design notes

  • GC respect is two-layered: snapshot tree walk skips unreferenced: true subtrees, and the
    attachment-blob filter parses GC tombstones / deleted-nodes with documented GC-lag
    tolerance (blobs absent from the GC graph are kept).
  • Concurrency is bounded (mapWithConcurrency: 32 blobs, 4 group snapshots).
  • Loading-group snapshots are intentionally NOT proactively captured — captureFullContainerState
    throws UsageError if any referenced subtree carries a groupId. This will be reintroduced when
    there's a known consumer and e2e harness.
  • Some snapshot/GC/blob-tree constants are duplicated from container-runtime. A
    cross-package contract test fails CI on drift.
  • Attachment-blob payloads are encoded via base64, which is different from existing inlined
    blobs since those are text-based payloads.

See also #27100, and this markdown file used during branch development,

Follow-ups

See this list of follow-ups accrued during development. These will be moved to ADO tasks once the PR is merged.

Additionally, these from Deep Review on a later iteration:

And some of my own thoughts on final review pass:

  • Do we have roundtrip test coverage of both loadExistingContainer and loadFrozenContainerFromPendingState?
  • Dedupe GC types (and code) or add contract tests to prevent drift

anthony-murphy and others added 10 commits April 20, 2026 10:48
Adds a driver-only free function that captures a container's current state
in the IPendingContainerState wire format using only an IDocumentServiceFactory
and IUrlResolver. Unlike Container.getPendingLocalState(), no runtime or
codeLoader is instantiated: the function fetches the latest snapshot, reads
the authoritative sequence number from the snapshot's attributes blob, drains
ops from delta storage from that sequence number, and serializes the result.
pendingRuntimeState is undefined, so the output is intended for state relay,
inspection, and durable-state snapshot use cases rather than rehydrating
in-flight DDS changes. The output can be fed back into loadExistingContainer
or loadFrozenContainerFromPendingState as pendingLocalState.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tainerPendingState

Extends captureContainerPendingState so the driver-level state is fully
self-contained for blob reads as well. Attachment blob bytes are fetched and
added to snapshotBlobs keyed by storage ID, which ContainerStorageAdapter
already serves through its cache — no wire-format change required.

GC state is consulted when present: blobs GC has explicitly marked
unreferencedTimestampMs, tombstoned, or deleted are skipped. Blobs absent
from the GC graph are kept, since GC lag can leave recently-attached blobs
off the graph and dropping them would lose live data. When the snapshot has
no GC tree (GC disabled or pre-GC document), every attachment blob from the
BlobManager redirect table is included.

The relevant blob manager / GC constants and the minimal parsing logic are
duplicated locally to avoid a loader → runtime dependency; comments point
back to the canonical definitions in container-runtime and
runtime-definitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ware

Extends the driver-only capture to cover the whole referenced graph of the
container, not just attachment blobs.

- Honours ISnapshotTree.unreferenced: a shared tree walker skips any subtree
  flagged unreferenced by the summarizer (which sets the flag from GC state)
  and inlines contents of every other blob it reaches. Replaces the
  unfiltered getBlobContentsFromTree path.
- Pre-fetches loading-group snapshots: enumerates groupIds on the base
  snapshot (skipping unreferenced subtrees), fetches each via
  IDocumentStorageService.getSnapshot({ versionId, loadingGroupIds }), runs
  the fetched snapshot through the same tree walker, and serialises the
  result into IPendingContainerState.loadedGroupIdSnapshots. If the driver
  lacks getSnapshot support or no groupIds are declared, no groups are
  included.
- GC parsing is done once and shared between tree-level and attachment-blob
  filtering. captureReferencedAttachmentBlobs now takes pre-parsed GC data.

Renames captureAttachmentBlobs.ts to captureReferencedContents.ts to reflect
the broader scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the reachability filtering and groupId-fetch paths that the local
end-to-end tests can't easily exercise (no summarizer runs in the local
test server, and TestFluidObjectFactory doesn't produce loading-group
datastores out of the box). Tests construct ISnapshotTree fixtures
directly and back readBlob/getSnapshot with an in-memory shim.

15 cases across readReferencedSnapshotBlobs, parseGcSnapshotData,
captureReferencedAttachmentBlobs, and captureGroupIdSnapshots — covering
unreferenced subtree skip, root .blobs special-casing, ISnapshot vs
ISnapshotTree input, GC lag tolerance (blobs absent from gcNodes are
kept), tombstone/deletedNodes skip, and groupId enumeration/dedup/fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptureFullContainerState

The function captures the whole referenced graph of the container (snapshot,
loading-group snapshots, inlined structural blobs, inlined attachment blobs,
trailing ops) — not just "pending state." The new name matches the scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e.spec

Collapse a multi-line chained .get() call onto a single line to satisfy
biome's formatter — CI was failing on `biome check .` in local-server-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inerState

Four issues flagged by Copilot review on PR microsoft#27100:

1. captureFullContainerState created an IDocumentService but never called
   dispose(). Wrap the capture body in try/finally and dispose in the
   finally to release driver-held resources (sockets/caches).

2. readReferencedSnapshotBlobs fanned every blob read at every tree level
   into a single Promise.all, giving unbounded concurrency on large
   snapshots. Refactor into a collect-then-fetch pipeline: walk the tree
   synchronously to gather referenced blob ids, then fetch via a new
   mapWithConcurrency helper capped at 32 in-flight reads.

3. captureReferencedAttachmentBlobs had the same unbounded-parallel issue
   over all referenced attachment storage ids. Route through the same
   mapWithConcurrency helper.

4. collectUnreferencedBlobLocalIds returned undefined when gcData.gcState
   was undefined, silently dropping tombstones/deletedNodes filtering even
   when those lists were populated. Contradicted the function docs. Now
   always applies tombstones/deletedNodes regardless of gcState presence,
   and returns a (possibly empty) Set rather than undefined. Added a unit
   test covering the gcState-undefined-but-tombstones-present case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 8a748cc: captureGroupIdSnapshots still fanned every
getSnapshot call into a single Promise.all. Route through
mapWithConcurrency with a lower limit (4) since each call pulls a whole
snapshot tree, not a single blob.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ip cache

Two issues from PR review:

1. Bind getSnapshot when extracting it from the storage service. Real
   driver implementations reference `this` (e.g.,
   LocalDocumentStorageService.getSnapshot reads this.id), so calling
   the detached method would TypeError in strict mode. Mirrors the
   bind pattern in protocolTreeDocumentStorageService.ts:31. Added a
   class-based unit test stub whose getSnapshot touches `this` — would
   have caught this.

2. Pass cacheSnapshot: false on every getSnapshot call we make from the
   capture path. This capture is transient; we don't want to pollute the
   driver's snapshot cache with it. Covered by a unit test asserting the
   option is forwarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (2014 lines, 13 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
markfields added 7 commits May 5, 2026 20:33
- Wire-format consts POJO + contract test
- GC-interesting test
- Monitoring context wired (to be reverted)
- API report regenerated
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
@markfields markfields marked this pull request as ready for review May 6, 2026 00:01
Copilot AI review requested due to automatic review settings May 6, 2026 00:01
@markfields markfields requested a review from a team as a code owner May 6, 2026 00:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new driver-only capture API to @fluidframework/container-loader that can produce a portable IPendingContainerState JSON for an attached document without creating a Loader/Container/Runtime, enabling fully-offline frozen-container rehydration scenarios.

Changes:

  • Adds captureFullContainerState (@legacy @alpha) to capture the latest snapshot + post-snapshot ops, inline referenced snapshot blobs, and inline referenced attachment blob bytes (base64) for portability.
  • Extends the pending-state wire format with attachmentBlobContents (base64) and wires decoding/deduping through load (SerializedStateManager) and storage (PendingLocalStateStore).
  • Adds unit + local-server integration coverage, plus a contract test to detect drift in duplicated wire-format constants.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/loader/container-loader/src/createAndLoadContainerUtils.ts Implements captureFullContainerState and its props interface; snapshot/op capture and assembly of IPendingContainerState.
packages/loader/container-loader/src/captureReferencedContents.ts New GC-aware snapshot walker + attachment-blob capture helpers and exported wireFormatConstants.
packages/loader/container-loader/src/serializedStateManager.ts Adds attachmentBlobContents to IPendingContainerState and decodes/merges it into the blob cache on load.
packages/loader/container-loader/src/pendingLocalStateStore.ts Dedupes the new attachmentBlobContents map across stored pending states.
packages/loader/container-loader/src/containerStorageAdapter.ts Introduces IBase64BlobContents type to make the base64-vs-utf8 encoding contract explicit.
packages/loader/container-loader/src/index.ts Exports captureFullContainerState, ICaptureFullContainerStateProps, and wireFormatConstants (internal).
packages/loader/container-loader/src/test/captureReferencedContents.spec.ts Unit tests for GC parsing/filtering, referenced-blob walking, attachment-blob behavior, and loading-group detection.
packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts Local-server integration tests for capture → frozen rehydrate, ops after snapshot, nested DDS handles, and binary attachment blobs.
packages/test/local-server-tests/src/test/wireFormatConstants.spec.ts Contract test ensuring loader-duplicated wire-format constants match runtime/runtime-definitions sources.
packages/runtime/container-runtime/src/index.ts Re-exports internal blob-manager wire-format constants for the contract test.
packages/runtime/container-runtime/src/blobManager/blobManager.ts Marks blobManagerBasePath as @internal for extraction/export hygiene.
packages/runtime/container-runtime/src/blobManager/blobManagerSnapSum.ts Marks redirectTableBlobName as @internal for extraction/export hygiene.
packages/loader/container-loader/api-report/container-loader.legacy.alpha.api.md API report update for the new @legacy @alpha export and props interface.
.changeset/wide-foxes-behave.md Changeset for the new API and related internal exports.
full-container-state-review-notes.md Adds detailed review notes / design and coverage tracking document.

Comment thread packages/loader/container-loader/src/createAndLoadContainerUtils.ts Outdated
Comment thread packages/loader/container-loader/src/createAndLoadContainerUtils.ts
Comment thread packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts Outdated
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts Outdated
Comment thread packages/loader/container-loader/src/serializedStateManager.ts
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
@markfields markfields requested review from dannimad and vladsud May 7, 2026 21:24
return { tree: snapshot, read: async (id) => storage.readBlob(id) };
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: UTF-8/base64 encoding split — process sign-off ask, no code change.

captureReferencedAttachmentBlobs writes base64 (bufferToString(buffer, "base64")) into a new IBase64BlobContents; readReferencedSnapshotBlobs writes UTF-8 into ISerializableBlobContents (serializedStateManager.ts:284-299, decode in containerStorageAdapter.ts:515-527). The split is technically correct and aligns with the runtime's existing pending-blob serializer (blobManager.ts:340-343, :951-960); the non-UTF-8 round-trip e2e test at captureFullContainerState.spec.ts:351-427 covers the binary case.

Process concern: vladsud's "we should stick to one format across layers for blobs" position from PR #20507 (with anthony-murphy explicitly looped in) hasn't been revisited on the record. vladsud is already requested; please also tag anthony-murphy so the area's "one format" stance is consciously superseded rather than silently bypassed. The binary-safety argument for the split is sound — no code change expected.

* Optional logger for driver-side telemetry.
*/
readonly logger?: ITelemetryBaseLogger | undefined;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Add abortSignal?: AbortSignal to ICaptureFullContainerStateProps before @alpha graduation.

ICaptureFullContainerStateProps exposes only urlResolver, documentServiceFactory, request, logger. The capture body concurrently reads snapshot/attachment blobs (bounded concurrency 32) and drains the op stream without a cancellation check. IDocumentDeltaStorageService.fetchMessages already accepts abortSignal?: AbortSignal (packages/common/driver-definitions/src/storage.ts:103-109), so the underlying primitives support it.

Suggested fix: add abortSignal?: AbortSignal to ICaptureFullContainerStateProps, propagate it into the mapWithConcurrency blob/group loops and the fetchMessages op-drain loop, and call signal.throwIfAborted() between bounded batches. Additive and cheaper to land before the API graduates from @alpha.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, adding it to follow-ups mentioned in PR description

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Thanks — accepting as a follow-up is reasonable given the API is still @alpha @legacy. One ask: gate the follow-up to land before graduation. The prop is additive, so it's cheaper now than once the surface is frozen; please link the ADO item here when filed and resolve the thread.

attachmentBlobContents,
);
Object.assign(attachmentBlobContents, added);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Op-stream drain trusts the driver silently — assert contiguity to fail closed.

captureFullContainerState drains connectToDeltaStorage().fetchMessages(attributes.sequenceNumber + 1, …) into savedOps and emits whatever the driver yields. Other parts of the same function fail-closed on suspicious states (UsageError on loading groups; GenericError on empty snapshot fetch); ops are the one place that trusts the driver silently. A buggy driver / network truncation would produce an artifact whose ops cannot be replayed, surfacing as a confusing rehydration-time failure pointing at the loader rather than at the capture.

Suggested fix: track a previous-sequence-number across the drain loop and assert each op's sequenceNumber === prev + 1; throw on a gap with the offending pair in the message. Optional: also assert the snapshot's latestSequenceNumber >= attributes.sequenceNumber at fetch time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good point. A downside to not using the DeltaManager code in the container loader, I guess. Tracking as a follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Thanks for tracking this. The contiguity assert keeps the artifact's replay invariants honest at capture time and matches the fail-closed posture used elsewhere in captureFullContainerState (UsageError on loading groups, GenericError on empty snapshot fetch). Please link the follow-up ADO item here when filed and resolve the thread; ideally land before @alpha graduation alongside #3204969959.

gcBlobPrefix: "__gc",
gcTombstoneBlobKey: "__tombstones",
gcDeletedBlobKey: "__deletedNodes",
} as const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Loader↔runtime constant duplication — gatekeeper nod ask, no code change.

This PR introduces a wireFormatConstants block in the loader mirroring seven runtime-side constants (.blobs, .redirectTable, _blobs, gc, __gc, __tombstones, __deletedNodes) and a cross-package deepStrictEqual contract test (wireFormatConstants.spec.ts). PR #21729 (anthony-murphy, 2024-07-02) was the deliberate refactor that produced these constants in their current form, with markfields reviewing for path-format discipline. This PR is the first cross-layer consumer; the chosen pattern (duplicate + contract test) sets precedent for any future shared wire-format constant.

Tag anthony-murphy for an explicit nod on the duplicate+contract-test layering choice, or merge with a clear note that AB#72934 will revisit (per thread #3192306888). No code change expected.

Comment on lines +334 to +336
urlResolver,
documentServiceFactory,
request,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthony-murphy thinking about replacing these 3 params with just documentService and having caller piece that together (maybe in follow-up PR). Does that work? I don't totally understand the environment in which this will be called and where these params will come from

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its tricky. overall i would love to move in a direction where fluid takes a document service, but it could be hard right now due lack of usage that way. there won't be any app examples, and i'm not sure the odsp driver supports it. i would be 100% down to support both maybe via overload or union to start moving in that direction, as we need to start somewhere

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for this direction. I think we can expose both - the core function that takes instance, and wrapper that takes factory. Might need to make some changes to the service interface. For examlpe, it would be nice to get resolved URI from service (assuming it's not exposed today) and thus get rid of URI resolver dependency as well

@markfields markfields enabled auto-merge (squash) May 8, 2026 04:56
* runtime authors) would corrupt non-UTF-8 byte sequences with
* replacement characters. Populated by `captureFullContainerState`; the
* live container's pending-state path leaves this `undefined` because
* it does not inline attachment blob contents.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably much larger change (and should be considered on its own), but I wonder if something like directory / map should support arrays as payloads? In addition to strings. But that would likely require massive changes across whole stack.

try {
const storage = await documentService.connectToStorage();

const versions = await storage.getVersions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any opportunities to dedup some code with Container.load()?

documentServiceFactory,
request,
logger,
}: ICaptureFullContainerStateProps): Promise<string> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got it right, this flow does not have websocket connection and fetching ops from websocket.
It might be enough for prototype, but we would need ops from socket (replace with REST call once/if we have such API from SPO in the future).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L390 we connect to delta storage and fetch messages. I'll have to look at what that's doing... is it possible it's already that REST API we're talking about? I forget and on mobile at the moment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh that's Storage's deltas, not PUSH, right? So would be missing the very latest messages?

@@ -0,0 +1,388 @@
/*!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some kind of stress test system here? I.e. a collection of random snapshots that go through new code path vs. full container load (with container runtime instantiation) and check that they produce same result in terms of ops fetched, blobs captured?
That assumes that we have equvalent flow for full container load (which we do not), so maybe we can't imlement it directly. But maybe some parts of it can be implemented, like asking GC subsystem for referenced blobs and compare that with what container-loader layer computed.

I just worry that as we make changes in runtime, they may not reflect in this layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be integrated into local-server-stress, which already does pending state capture from live containers

Comment thread .changeset/wide-foxes-behave.md Outdated
"@fluidframework/container-runtime": minor
"__section": feature
---
Add a new free function `captureFullContainerState` for serializing a container without loading it, for rehydrating into a future session.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add a new free function `captureFullContainerState` for serializing a container without loading it, for rehydrating into a future session.
Serialize a container without loading it for rehydrating a future session with `captureFullContainerState`

I would mention that it's a free function somewhere in the longer description as opposed to the title.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to drop the changeset, we don't need to advertise this at the moment. We can do so when promoting to beta.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  288641 links
    1922 destination URLs
    2172 URLs ignored
       0 warnings
       0 errors


@anthony-murphy
Copy link
Copy Markdown
Contributor

Deep Review

Reviewed commit ad41359 on 2026-05-08.

Readiness: 8/10 — 🟢 ALMOST READY

No Tier 1 or Tier 2 issues remain. The two newest concrete asks (AbortSignal plumbing on ICaptureFullContainerStateProps, op-stream contiguity assert in the fetchMessages drain) are author-accepted as follow-ups. What's left are governance and deferred-capability items — gatekeeper acks from anthony-murphy on the wire-format extension, UTF-8/base64 split, and loader-side constant duplication; the revived-unreferenced-subtree JSDoc caveat; an attachmentBlobContents forward-compat TSDoc; and confirming the loading-group fail-fast is tracked against an ADO item with a target consumer.

Path to Ready

  • Resolve inline threads — link the AbortSignal (#3204969959) and op-contiguity-assert (#3204970080) follow-up ADO items from each thread, then resolve
  • Add anthony-murphy as a requested reviewer (or @-mention on serializedStateManager.ts, captureReferencedContents.ts, and the changeset file) so the wire-format extension, UTF-8/base64 encoding split, and loader-side constant duplication are explicitly acked by the area's gatekeeper (closes the process side of three threads — #3192334059, #3204969859, #3204970184 — as one batch)
  • Add a JSDoc note on captureFullContainerState calling out revived-unreferenced subtrees as a known self-containment caveat (#3202581532); decide whether the conservative interim UsageError (when base snapshot has any unreferenced subtree AND savedOps.length > 0) lands before @alpha graduation or whether the follow-up ADO item lands first
  • Add a TSDoc note on IPendingContainerState.attachmentBlobContents covering the forward-compat asymmetry (an older container-loader reading state written by a newer captureFullContainerState silently drops the field, making attachment blobs unreachable in offline / frozen-load scenarios)
  • Confirm the loading-group UsageError deferral is tracked against an ADO item with a target consumer (the PR risk-section text alone — "no known consumer + no e2e harness" — does not auto-prioritize)

Context for Reviewers

For human reviewer
  • anthony-murphy — Wire-format direction: optional field with JSDoc compat note (current) vs versioned envelope vs another shape; whether the IPendingContainerState literal should be constructed in captureFullContainerState directly or via a helper in serializedStateManager for shape ownership; whether the seven duplicated runtime constants belong in driver-definitions per AB#72934. Confirms or retires the "one format across layers" stance from PR Removal of omitted flag from snapshot as service will not supply it. #20507 in light of the binary-safety rationale.
  • dannimad — Procedurally required as consumer-side owner via Expose creation of a frozen document service factory. #25653 (createFrozenDocumentServiceFactory); confirm the captured artifact shape meets rehydrate-side expectations.
  • jatgarg — Procedural sign-off on the mixinMonitoringContext / configProvider divergence in createAndLoadContainerUtils.ts (the convention enforced on the sibling loadSummarizerContainerAndMakeSummary in PR On demand summarizer for on demand summaries #25394). Author's docblock acknowledges the divergence; confirm or require an optional configProvider? prop now for forward-compat.
  • ChumpChief — Alpha-surface scaling: the "how does this scale?" question from PR ContainerLoader: Move getPendingLocalState to legacy/alpha #25513 remains open across four alpha additions; resolve the convention (overload of asLegacyAlpha vs standalone free functions).
  • tyler-cai-microsoft — Confirm the loading-groups UsageError is acceptable scope and follow-up tracking captures the parity gap with PR [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565.
  • agarwal-navin — Confirm the GC-filter / tombstone-recovery tradeoff for the captured artifact (tombstoned blobs foreclosed by the artifact, different from the live runtime's allowTombstone recoverability), and weigh in on the revived-unreferenced deferral.
  • vladsud — Already engaged on threads #3209182218 (binary payloads in directory/map), #3209209246 (dedup vs Container.load()), #3209227685 (websocket vs storage-deltas op fetch); these are forward-looking design discussions, not blockers for this PR.
  • Cannot be assessed by the pipeline:
    • Real-world memory/throughput on long-lived documents with millions of trailing ops.
    • Whether mapWithConcurrency limits (32 blobs / 4 group snapshots) are correct under production driver-throttling regimes (notably ODSP).
    • Whether the captured artifact loaded by loadFrozenContainerFromPendingState reproduces identical runtime behavior across all document shapes; the local-server test exercises one round-trip only.
    • vladsud's stress-test ask in thread #3209258894 (random snapshots through capture vs full-container-load comparison) — anthony-murphy suggested integrating into local-server-stress; design judgment call on scope.
Review history (6 prior reviews)
  • c21964d 2026-05-07 · 7/10 — polish/process items remain — gatekeeper acks, AbortSignal, op-contiguity assert
  • 6e66f7b 2026-05-07 · 4/10 — post-snapshot blobAttach gap resolved; new revived-unreferenced self-containment hole and wire-format-extension JSDoc still open
  • 1a4bcfa 2026-05-06 · 4/10 — UTF-8 wire-format defect resolved; post-snapshot blobAttach gap and format-extension JSDoc surface
  • cdda9ba 2026-05-05 · 4/10 — UTF-8 wire-format defect resolved; post-snapshot blobAttach gap and format-extension JSDoc surface
  • 1d5a51b 2026-05-05 · 3/10 — UTF-8 attachment-blob round-trip flagged Tier 1
  • 57a1a76 2026-05-04 · 5/10 — GC-redirect-identity and unbounded-savedOps flagged (both later reclassified)

return added;
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Loading-group fail-fast — confirm follow-up tracking against a target consumer.

snapshotHasLoadingGroups(baseSnapshot) throws UsageError whenever any referenced subtree carries a groupId, and captureFullContainerState leaves loadedGroupIdSnapshots: undefined. Fail-fast is the right interim for an @alpha @legacy surface — preferable to emitting an incomplete artifact — and consistent with PR #20565's pending-state model for loading groups (tyler-cai-microsoft).

No code change requested. Two asks:

  1. Confirm the deferral is tracked against an open ADO/work item with a target consumer. The PR risk-section text alone ("no known consumer + no e2e harness") does not auto-prioritize; once a consumer surfaces, the per-groupId storage.getSnapshot({ loadingGroupIds: [groupId], cacheSnapshot: false }) path needs to follow PR [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565's storage convention rather than inventing a new one.
  2. Tag tyler-cai-microsoft to confirm the parity gap with PR [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565 is acceptable scope for v1.

@markfields markfields merged commit c2d70db into microsoft:main May 8, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants