feat(tree): internals for field batch format with specialized node shapes by justus-camp-microsoft · Pull Request #27200 · microsoft/FluidFramework

justus-camp-microsoft · 2026-04-29T16:37:32Z

Description

Adds internals for a new experimental chunked-forest codec format (vTextExperimental, version key "text") targeted at compressing runs of nodes that differ only in a few properties — for example, styled text runs where many nodes share most of their structural skeleton.

The core addition is EncodedSpecializedNodeShape (f shape variant): a node shape that derives from another node shape by overlaying property-level overrides. A base node shape defines the structural skeleton; the specialization specifies overrides for those properties.

Decode-side support lives in SpecializedNodeDecoder, which walks the specialized f shapes back to a given c shape via resolveToNodeShape, then merges via applySpecialization: base field order is preserved, overridden fields are replaced in place, and new fields are appended. Cycles and missing bases are caught with asserts.

Existing types broadened for the new union:

EncodedChunkShapeV1OrV2 → EncodedChunkShape (now V1 | V2 | VTextExperimental).
shapesV2 factored out of EncodedChunkShapeV2 so the new format can extend it.

This PR is decoder + format scaffolding only. Encoders are not yet emitting f shapes; that will follow in a separate change. The codec will be registered as part of introducing the encoder.

Reviewer Guidance

The review process is outlined on this wiki page.

github-actions · 2026-04-29T16:38:42Z

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (1317 lines, 8 files), I've queued these reviewers:

Correctness — logic errors, race conditions, lifecycle issues
Security — vulnerabilities, secret exposure, injection
API Compatibility — breaking changes, release tags, type design
Performance — algorithmic regressions, memory leaks
Testing — coverage gaps, hollow tests

How this works

Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.
Tick Start review below to dispatch the review fleet.
After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.
This comment updates as new commits land; your reviewer selections are preserved.
Start review

Copilot

Pull request overview

Adds scaffolding for an experimental “text” chunked-forest codec shape (f) that can derive node shapes from a base shape via property-level overrides, enabling better compression for runs of structurally-similar nodes.

Changes:

Introduces EncodedSpecializedNodeShape / EncodedChunkShapeVTextExperimental schema and adds vTextExperimental: "text" to FieldBatchFormatVersion.
Adds decode-side support for f shapes via SpecializedNodeDecoder plus shape-chain resolution and specialization merging.
Broadens encoder/decoder shape typing to a unified EncodedChunkShape union and adds targeted unit tests for specialization behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/dds/tree/src/test/feature-libraries/chunked-forest/codec/chunkDecoding.spec.ts	Adds extensive unit coverage for specialized-node decoding and dispatcher integration.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/nodeEncoder.ts	Updates encoder shape generics to use the new `EncodedChunkShape` union type.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/format/versions.ts	Adds `vTextExperimental` version, defines `EncodedChunkShape` union including the new format.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/format/index.ts	Re-exports the new format schema/types and updated chunk-shape union.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/format/formatVText.ts	Defines the experimental text codec schema, including the new `f` shape.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/format/formatV2.ts	Factors out `shapesV2` for reuse by the experimental text schema.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/compressedEncode.ts	Broadens encoder buffer/shape generics to `EncodedChunkShape`.
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts	Implements `SpecializedNodeDecoder`, specialization merge logic, and dispatcher support for `f`.

CraigMacomber · 2026-04-29T19:30:13Z

 }

+/**
+ * Walks a chain of specialized node shapes (`f`) back to the concrete `c` node shape they are


From this description and the function name, it sounds like this returns the base shape.

From the implementation, this looks like it looks up shape, and normalizes out any specialized node shapes. I think the docs and likely function name could use adjustment.

CraigMacomber · 2026-04-29T20:00:04Z

Note about the PR title: this is a new format for an existing codec, not a new codec (our code doesn't separate these well yet, but I've been trying to make it clearer. I'd describe this as "add internals for experimental field batch codec format with specialized node shapes" (or maybe shortened to "internals for field batch format with specialized node shapes" for the description)

It might make sense to include the addition of the new format to fieldBatchCodecBuilder as part of this (lack of that is why I called this "internals" above), and thus support actually encoding and decoding the new format (just not using f in the encoder yet. That would opt the new code into per format test coverage for things like round tripping and json schema snapshotting. Since the format version is unstable (a string) that should be safe. I'm fine with it either way though.

justus-camp-microsoft · 2026-04-30T18:36:32Z

Note about the PR title: this is a new format for an existing codec, not a new codec (our code doesn't separate these well yet, but I've been trying to make it clearer. I'd describe this as "add internals for experimental field batch codec format with specialized node shapes" (or maybe shortened to "internals for field batch format with specialized node shapes" for the description)

It might make sense to include the addition of the new format to fieldBatchCodecBuilder as part of this (lack of that is why I called this "internals" above), and thus support actually encoding and decoding the new format (just not using f in the encoder yet. That would opt the new code into per format test coverage for things like round tripping and json schema snapshotting. Since the format version is unstable (a string) that should be safe. I'm fine with it either way though.

I updated the title/description to denote that the codec isn't actually registered and that this is internals. I think it's easier to register the codec once we have both the encoder and decoder so I'll wait for the next change to do that if that's alright.

CraigMacomber · 2026-05-05T18:42:18Z

+ * Exported for testing.
+ */
+export function normalizeToNodeShape(
+	shapeIndex: number,


For clarity, use our ShapeIndex type alias for number here and in the set below.

CraigMacomber · 2026-05-05T18:43:40Z

+export function normalizeToNodeShape(
+	shapeIndex: number,
+	context: DecoderContext<EncodedChunkShape>,
+	visited: Set<number> = new Set(),


Extending the API docs to cover parameters, and noting why this exists and how most callers don't need to provide it would be good.

Maybe a more specific name for it would help too: our codec does have logic to traverse and visit all shape which this might be confused with.

CraigMacomber · 2026-05-05T18:57:46Z

 }

+/**
+ * Resolves `shapeIndex` to a fully-merged {@link EncodedNodeShape}, normalizing away any


I think this should either:

document that it only is allowed to be used on EncodedNodeShape | SpecalizedNodeShape
OR

work on any node shape, and return a union of all the types except SpecalizedNodeShape
OR

Take in the input as type EncodedNodeShape | SpecalizedNodeShape instead of a shape index.

CraigMacomber · 2026-05-05T19:04:22Z

+ * @remarks
+ * Exported for testing.
+ */
+export function applySpecialization(


EncodedSpecializedNodeShape documents the merge rules: this should refer to/link that and that should likely link here as well.

Also I noticed an issue with those documented rules:

extraFields are inherited unless the specialization sets them as own properties — to

inherit, the property must be omitted; setting it explicitly (even to false or

undefined) is treated as an override.

These types are supposed to be JSON compatible since they get json serialized. JSON does not preserve undefined properties, so we should not have them be significant or it won't work with real encoded data (I'm not sure if this is a docs bug or a real format issue)

Also that doc calls these rules "Merge rules": given we have a bunch of logic about merges in this codebase which is about a different thing, calling them specialization rules or override rules would be more clear I think.

CraigMacomber · 2026-05-05T19:09:41Z

+		}
+	}
+	for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {
+		if (!overriddenKeys.has(context.identifier<FieldKey>(keyEncoded))) {


This could just use overrides instead of overriddenKeys, then overriddenKeys can be entirely removed.

CraigMacomber · 2026-05-05T19:12:11Z

+			mergedFields.push([keyEncoded, overrideShape]);
+		}
+	}
+	for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {


A code comment on these loops would be helpful. This one could be something like:

Suggested change

for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {

// Add all fields for all overrides from spec that are new fields, in the order they are specified

for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {

CraigMacomber · 2026-05-05T19:14:10Z

+ */
+export function applySpecialization(
+	base: EncodedNodeShape,
+	spec: EncodedSpecializedNodeShape,


"spec" is an ambiguous abbreviation (and we try to avoid abbreviations in the first place). Most people would assume this means specification, but here it might mean specialization.

Renaming this to "overrides" (and "overrides" below to "fieldOverrides") would be more clear.

CraigMacomber · 2026-05-05T19:25:41Z

+	context: DecoderContext<EncodedChunkShape>,
+): EncodedNodeShape {
+	const overrides = new Map<FieldKey, number>();
+	for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {


This algorithm with its three loops seems like it could be simplified, and made a bit mor intuitive if you refactored it.

Copy the base fields array: const fields = [...base.fields];
Make a lookup for where to override: "const indexFromKey = new Map(base.fields.map(...));
Make a pass over spec.fields, either updating items in fields using indexFromKey, or appending to the array.

I feel like that approach is more obviously a correct implementation of the state merge policy.

CraigMacomber · 2026-05-05T19:31:23Z

+		type: base.type,
+		value: "value" in spec ? spec.value : base.value,
+		fields: mergedFields.length > 0 ? mergedFields : undefined,
+		extraFields: "extraFields" in spec ? spec.extraFields : base.extraFields,


This is going to break due to JSON not persisting undefined fields if trying to remove optional fields in an override. I think you might need to store some other value (like 0, false or null: 0 is the shortest of those so maybe pick that) to encode that case.

The fact you missed this implies that there might be a gap in test coverage, either for this case and/or not testing with the JSON round tripping.

CraigMacomber · 2026-05-05T19:33:25Z

+		context: DecoderContext<EncodedChunkShape>,
+	) {
+		this.inner = new NodeDecoder(
+			applySpecialization(normalizeToNodeShape(shape.base, context), shape, context),


can't you replace this with a single call to normalizeToNodeShape on shape? (Might be work tweaking the signature for normalizeToNodeShape to enable that)

introduce new experimental codec

796fd25

Copilot AI review requested due to automatic review settings April 29, 2026 16:37

justus-camp-microsoft requested a review from a team as a code owner April 29, 2026 16:37

Copilot started reviewing on behalf of justus-camp-microsoft April 29, 2026 16:38 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

CraigMacomber reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts Outdated

CraigMacomber reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts Outdated

CraigMacomber reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts Outdated

most of feedback

6854d26

justus-camp-microsoft changed the title ~~feat(tree): add experimental codec with specialized node shapes~~ feat(tree): internals for field batch format with specialized node shapes Apr 30, 2026

CraigMacomber reviewed May 5, 2026

View reviewed changes

justus-camp-microsoft added 2 commits May 6, 2026 13:59

address decoder review comments

72270f9

self-review

c882f28

	for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {
	// Add all fields for all overrides from spec that are new fields, in the order they are specified
	for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {

Conversation

justus-camp-microsoft commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewer Guidance

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CraigMacomber commented Apr 29, 2026

Uh oh!

justus-camp-microsoft commented Apr 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

justus-camp-microsoft commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading