Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,17 @@ import {
import type { IncrementalDecoder } from "./codecs.js";
import {
type EncodedAnyShape,
type EncodedChunkShapeV1OrV2,
type EncodedChunkShape,
type EncodedChunkShapeV2,
type EncodedChunkShapeVTextExperimental,
type EncodedFieldBatchV1OrV2,
type EncodedFieldBatchV2,
type EncodedFieldShape,
type EncodedIncrementalChunkShape,
type EncodedInlineArrayShape,
type EncodedNestedArrayShape,
type EncodedNodeShape,
type EncodedSpecializedNodeShape,
type EncodedValueShape,
SpecialField,
supportsIncrementalEncoding,
Expand Down Expand Up @@ -76,9 +79,109 @@ export function decode(
);
}

/**
* Walks a chain of specialized node shapes (`f`) back to the concrete `c` node shape they are
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this description and the function name, it sounds like this returns the base shape.

From the implementation, this looks like it looks up shape, and normalizes out any specialized node shapes. I think the docs and likely function name could use adjustment.

* ultimately based on.
*/
function resolveToNodeShape(
Comment thread
justus-camp-microsoft marked this conversation as resolved.
Outdated
shapeIndex: number,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, use our ShapeIndex type alias for number here and in the set below.

context: DecoderContext<EncodedChunkShape>,
visited: Set<number> = new Set(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the API docs to cover parameters, and noting why this exists and how most callers don't need to provide it would be good.

Maybe a more specific name for it would help too: our codec does have logic to traverse and visit all shape which this might be confused with.

): EncodedNodeShape {
assert(!visited.has(shapeIndex), "cyclic specialized node shape chain");
visited.add(shapeIndex);
const encoded = context.shapes[shapeIndex];
assert(encoded !== undefined, "base shape index out of bounds");
if (encoded.c !== undefined) {
Comment thread
justus-camp-microsoft marked this conversation as resolved.
Outdated
return encoded.c;
}
assert(
"f" in encoded && encoded.f !== undefined,
"specialized node shape base must resolve to a node shape",
Comment thread
justus-camp-microsoft marked this conversation as resolved.
Outdated
);
return applySpecialization(
resolveToNodeShape(encoded.f.base, context, visited),
encoded.f,
context,
);
Comment thread
justus-camp-microsoft marked this conversation as resolved.
}

/**
* Produces a merged {@link EncodedNodeShape} by overlaying `spec` onto `base`.
*
* `type` is always inherited from `base`. `value` and `extraFields` are inherited from `base`
* unless `spec` includes them as own properties (regardless of value), in which case `spec`'s
* value wins.
*
* For `fields`, override entries whose key matches a key in `base.fields` replace that entry's
* shape index in place; override entries with keys not present in `base` are appended at the end
* in `spec.fields` order. Field order from `base` is preserved.
*/
function applySpecialization(
base: EncodedNodeShape,
spec: EncodedSpecializedNodeShape,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"spec" is an ambiguous abbreviation (and we try to avoid abbreviations in the first place). Most people would assume this means specification, but here it might mean specialization.

Renaming this to "overrides" (and "overrides" below to "fieldOverrides") would be more clear.

context: DecoderContext<EncodedChunkShape>,
): EncodedNodeShape {
const overrides = new Map<FieldKey, number>();
for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm with its three loops seems like it could be simplified, and made a bit mor intuitive if you refactored it.

Copy the base fields array: const fields = [...base.fields];
Make a lookup for where to override: "const indexFromKey = new Map(base.fields.map(...));
Make a pass over spec.fields, either updating items in fields using indexFromKey, or appending to the array.

I feel like that approach is more obviously a correct implementation of the state merge policy.

const key = context.identifier<FieldKey>(keyEncoded);
assert(!overrides.has(key), "duplicate field key in specialized node shape");
overrides.set(key, shapeIndex);
}

const mergedFields: EncodedFieldShape[] = [];
const overriddenKeys = new Set<FieldKey>();
const baseKeys = new Set<FieldKey>();
for (const [keyEncoded, shapeIndex] of base.fields ?? []) {
const key = context.identifier<FieldKey>(keyEncoded);
assert(!baseKeys.has(key), "duplicate field key in base node shape");
baseKeys.add(key);
const overrideShape = overrides.get(key);
if (overrideShape === undefined) {
mergedFields.push([keyEncoded, shapeIndex]);
} else {
overriddenKeys.add(key);
mergedFields.push([keyEncoded, overrideShape]);
}
}
for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A code comment on these loops would be helpful. This one could be something like:

Suggested change
for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {
// Add all fields for all overrides from spec that are new fields, in the order they are specified
for (const [keyEncoded, shapeIndex] of spec.fields ?? []) {

if (!overriddenKeys.has(context.identifier<FieldKey>(keyEncoded))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just use overrides instead of overriddenKeys, then overriddenKeys can be entirely removed.

mergedFields.push([keyEncoded, shapeIndex]);
}
}

return {
type: base.type,
value: "value" in spec ? spec.value : base.value,
fields: mergedFields.length > 0 ? mergedFields : undefined,
extraFields: "extraFields" in spec ? spec.extraFields : base.extraFields,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to break due to JSON not persisting undefined fields if trying to remove optional fields in an override. I think you might need to store some other value (like 0, false or null: 0 is the shortest of those so maybe pick that) to encode that case.

The fact you missed this implies that there might be a gap in test coverage, either for this case and/or not testing with the JSON round tripping.

};
}

/**
* Decoder for {@link EncodedSpecializedNodeShape}s.
* Merges the specialization's field overrides with the resolved base node shape, then delegates
* to a {@link NodeDecoder} built from the merged shape.
*/
export class SpecializedNodeDecoder implements ChunkDecoder {
private readonly inner: NodeDecoder;
public constructor(
shape: EncodedSpecializedNodeShape,
context: DecoderContext<EncodedChunkShape>,
) {
this.inner = new NodeDecoder(
applySpecialization(resolveToNodeShape(shape.base, context), shape, context),
context,
);
}
public decode(decoders: readonly ChunkDecoder[], stream: StreamCursor): TreeChunk {
return this.inner.decode(decoders, stream);
}
}

const decoderLibrary = new DiscriminatedUnionDispatcher<
EncodedChunkShapeV1OrV2,
[context: DecoderContext<EncodedChunkShapeV1OrV2>],
EncodedChunkShapeVTextExperimental,
[context: DecoderContext<EncodedChunkShape>],
ChunkDecoder
>({
a(shape: EncodedNestedArrayShape, context): ChunkDecoder {
Expand All @@ -99,6 +202,9 @@ const decoderLibrary = new DiscriminatedUnionDispatcher<
): ChunkDecoder {
return new IncrementalChunkDecoder(context);
},
f(shape: EncodedSpecializedNodeShape, context): ChunkDecoder {
return new SpecializedNodeDecoder(shape, context);
},
});

/**
Expand Down Expand Up @@ -300,7 +406,7 @@ type BasicFieldDecoder = (
* Get a decoder for fields of a provided (via `shape` and `context`).
*/
function fieldDecoder(
context: DecoderContext<EncodedChunkShapeV1OrV2>,
context: DecoderContext<EncodedChunkShape>,
key: FieldKey,
shape: number,
): BasicFieldDecoder {
Expand All @@ -319,7 +425,7 @@ export class NodeDecoder implements ChunkDecoder {
private readonly fieldDecoders: readonly BasicFieldDecoder[];
public constructor(
private readonly shape: EncodedNodeShape,
private readonly context: DecoderContext<EncodedChunkShapeV1OrV2>,
private readonly context: DecoderContext<EncodedChunkShape>,
) {
this.type = shape.type === undefined ? undefined : context.identifier(shape.type);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ import type { FieldBatch } from "./fieldBatch.js";
import {
type EncodedAnyShape,
type EncodedChunkShapeV1,
type EncodedChunkShapeV1OrV2,
type EncodedChunkShape,
type EncodedChunkShapeV2,
type EncodedFieldBatchV1OrV2,
type EncodedNestedArrayShape,
Expand Down Expand Up @@ -63,8 +63,8 @@ export function compressedEncode(
return updateShapesAndIdentifiersEncoding(context.version, batchBuffer);
}

export type BufferFormat = BufferFormatGeneric<EncodedChunkShapeV1OrV2>;
export type Shape = ShapeGeneric<EncodedChunkShapeV1OrV2>;
export type BufferFormat = BufferFormatGeneric<EncodedChunkShape>;
export type Shape = ShapeGeneric<EncodedChunkShape>;

/**
* Like {@link FieldEncoder}, except data will be prefixed with the key.
Expand Down Expand Up @@ -166,7 +166,7 @@ export function asNodesEncoder(encoder: NodeEncoder): NodesEncoder {
/**
* Encodes a chunk with {@link EncodedAnyShape} by prefixing the data with its shape.
*/
export class AnyShape extends ShapeGeneric<EncodedChunkShapeV1OrV2> {
export class AnyShape extends ShapeGeneric<EncodedChunkShape> {
private constructor() {
super();
}
Expand Down Expand Up @@ -271,7 +271,7 @@ export const anyFieldEncoder: FieldEncoder = {
* which is an easy way to keep all the related code together without extra objects.
*/
export class InlineArrayEncoder
extends ShapeGeneric<EncodedChunkShapeV1OrV2>
extends ShapeGeneric<EncodedChunkShape>
implements NodesEncoder, FieldEncoder
{
public static readonly empty: InlineArrayEncoder = new InlineArrayEncoder(0, {
Expand Down Expand Up @@ -355,7 +355,7 @@ export class InlineArrayEncoder
/**
* Encodes the shape for a nested array as {@link EncodedNestedArrayShape} shape.
*/
export class NestedArrayShape extends ShapeGeneric<EncodedChunkShapeV1OrV2> {
export class NestedArrayShape extends ShapeGeneric<EncodedChunkShape> {
/**
* @param innerShape - The shape of each item in this nested array.
*/
Expand All @@ -366,7 +366,7 @@ export class NestedArrayShape extends ShapeGeneric<EncodedChunkShapeV1OrV2> {
public encodeShape(
identifiers: DeduplicationTable<string>,
shapes: DeduplicationTable<Shape>,
): EncodedChunkShapeV1OrV2 {
): EncodedChunkShape {
const shape: EncodedNestedArrayShape =
shapes.valueToIndex.get(this.innerShape) ??
fail(0xb4f /* index for shape not found in table */);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,20 @@ import { shapesV1 } from "./formatV1.js";
export type EncodedIncrementalChunkShape = Static<typeof EncodedIncrementalChunkShape>;
export const EncodedIncrementalChunkShape = Type.Literal(0);

/**
* The chunk shapes supported by the V2 format.
* @remarks
* See {@link EncodedChunkShapeV2}.
*/
export const shapesV2 = {
...shapesV1,
e: Type.Optional(EncodedIncrementalChunkShape),
} as const;

/**
* V2 extension of {@link EncodedChunkShapeV1}.
* @remarks
* See {@link DiscriminatedUnionDispatcher} for more information on this pattern.
*/
export type EncodedChunkShapeV2 = Static<typeof EncodedChunkShapeV2>;
export const EncodedChunkShapeV2 = Type.Object(
{
...shapesV1,
e: Type.Optional(EncodedIncrementalChunkShape),
},
unionOptions,
);
export const EncodedChunkShapeV2 = Type.Object(shapesV2, unionOptions);
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
/*!
* Copyright (c) Microsoft Corporation and contributors. All rights reserved.
* Licensed under the MIT License.
*/

import { type Static, Type } from "@sinclair/typebox";

import { unionOptions } from "../../../../codec/index.js";

import { ShapeIndex } from "./formatGeneric.js";
import { EncodedFieldShape, EncodedValueShape } from "./formatV1.js";
import { shapesV2 } from "./formatV2.js";

/**
* A node shape that derives from another node shape by overlaying property-level overrides.
*
* @remarks
* Compresses runs of node shapes that differ only in a few properties: a base node shape
* defines the structural skeleton, and the specialization narrows specific properties.
*
* For example, a base `FormatNode` with a variable-boolean `bold` field can be specialized
* to a shape that pins `bold` to a constant `true` — every node decoded with the
* specialization contributes zero stream tokens for `bold`.
*
* Merge rules (see {@link applySpecialization}): `type` is always inherited from the resolved
* base. `fields`, `value`, and `extraFields` are inherited unless the specialization sets them
* as own properties — to inherit, the property must be omitted; setting it explicitly (even to
* `false` or `undefined`) is treated as an override.
Comment thread
justus-camp-microsoft marked this conversation as resolved.
Outdated
*/
export type EncodedSpecializedNodeShape = Static<typeof EncodedSpecializedNodeShape>;
export const EncodedSpecializedNodeShape = Type.Object(
{
/**
* Index into the enclosing batch's shapes array of the shape this specializes.
* Must resolve to either an {@link EncodedNodeShape} or another
* `EncodedSpecializedNodeShape`; chains are followed transitively until a node shape
* is reached. This restriction is enforced at runtime, not by the schema.
*/
base: ShapeIndex,
/**
* Field-level overrides applied to the resolved base's `fields`. Entries whose key
* matches a base field replace that field's shape index in place; entries with new
* keys are appended after all base fields, in the order given here. Base field order
* is preserved — this is the stream consumption order at decode time, so encoders
* must serialize per-field tokens in the merged order, not in this list's order.
*/
fields: Type.Optional(Type.Array(EncodedFieldShape)),
/**
* If present, replaces the resolved base's value shape.
*/
value: Type.Optional(EncodedValueShape),
/**
* If present, replaces the resolved base's extraFields shape.
*/
extraFields: Type.Optional(ShapeIndex),
},
{ additionalProperties: false },
);

/**
* Experimental extension of {@link EncodedChunkShapeV2}.
* @remarks
* See {@link DiscriminatedUnionDispatcher} for more information on this pattern.
*/
export type EncodedChunkShapeVTextExperimental = Static<
typeof EncodedChunkShapeVTextExperimental
>;
export const EncodedChunkShapeVTextExperimental = Type.Object(
{
...shapesV2,
f: Type.Optional(EncodedSpecializedNodeShape),
},
unionOptions,
);
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,19 @@ export {
SpecialField,
} from "./formatV1.js";
export { EncodedIncrementalChunkShape, EncodedChunkShapeV2 } from "./formatV2.js";
export {
EncodedChunkShapeVTextExperimental,
EncodedSpecializedNodeShape,
} from "./formatVText.js";
export {
FieldBatchFormatVersion,
EncodedFieldBatchV1,
EncodedFieldBatchV2,
EncodedFieldBatchVTextExperimental,
supportsIncrementalEncoding,
type EncodedFieldBatchV1OrV2,
type EncodedFieldBatchV1AndV2,
type EncodedChunkShapeV1OrV2,
type EncodedChunkShape,
} from "./versions.js";
export type {
ShapeIndex,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import { strictEnum, type Values } from "../../../../util/index.js";
import { EncodedFieldBatchGeneric } from "./formatGeneric.js";
import { EncodedChunkShapeV1 } from "./formatV1.js";
import { EncodedChunkShapeV2 } from "./formatV2.js";
import { EncodedChunkShapeVTextExperimental } from "./formatVText.js";

/**
* The format version for the field batch.
Expand All @@ -29,6 +30,10 @@ export const FieldBatchFormatVersion = strictEnum("FieldBatchFormatVersion", {
* {@link EncodedIncrementalChunkShape} was added in this version.
*/
v2: 2,
/**
* Experimental codec with optimizations for text.
*/
vTextExperimental: "text",
});

/**
Expand Down Expand Up @@ -68,6 +73,17 @@ export const EncodedFieldBatchV2 = EncodedFieldBatchGeneric(
EncodedChunkShapeV2,
);

/**
* Encoded {@link FieldBatch} using the experimental text optimized format.
*/
export type EncodedFieldBatchVTextExperimental = Static<
typeof EncodedFieldBatchVTextExperimental
>;
export const EncodedFieldBatchVTextExperimental = EncodedFieldBatchGeneric(
FieldBatchFormatVersion.vTextExperimental,
EncodedChunkShapeVTextExperimental,
);

/**
* Encoded {@link FieldBatch}, which might use V2 features, but might also have been from a V1 encoder.
* @remarks
Expand All @@ -88,9 +104,14 @@ export type EncodedFieldBatchV1OrV2 = EncodedFieldBatchV1 | EncodedFieldBatchV2;
export type EncodedFieldBatchV1AndV2 = EncodedFieldBatchV1 & EncodedFieldBatchV2;

/**
* Encoded chunk shape, which might use V2 features, but might also have been from a V1 encoder.
* An encoded chunk shape from any known {@link FieldBatchFormatVersion}.
*
* @remarks
* Type wise, equivalent to V2, as that is a superset of V1.
* Used instead of just V2 for clarity.
* Use this when working with chunk shapes uniformly across versions — for example, in the
* shared decoder dispatcher and in encoder shape base classes. New format versions should
* add their chunk shape variant to this union.
*/
export type EncodedChunkShapeV1OrV2 = EncodedChunkShapeV1 | EncodedChunkShapeV2;
export type EncodedChunkShape =
| EncodedChunkShapeV1
| EncodedChunkShapeV2
| EncodedChunkShapeVTextExperimental;
Loading
Loading