diff --git a/docs/architecture docs/adrs/ADR 208 - Higher-order codecs for parameterized types.md b/docs/architecture docs/adrs/ADR 208 - Higher-order codecs for parameterized types.md index eaa0e18308..c00966bb0d 100644 --- a/docs/architecture docs/adrs/ADR 208 - Higher-order codecs for parameterized types.md +++ b/docs/architecture docs/adrs/ADR 208 - Higher-order codecs for parameterized types.md @@ -113,7 +113,11 @@ The same `vector(1536)` participates in four code paths. Each reads a different ### 2. No-emit type resolution -`@prisma-next/sql-contract-ts`'s `FieldOutputType` follows `typeRef` through `storage.types`, then synthetically applies `CodecInstanceContext` to the column's `type` slot at the type level and reads the `Js` parameter off the resulting `Codec<…, Js>`. For `vector(1536)`, this produces `Vector<1536>` (literal `N` preserved through curried application). For non-parameterized columns (no `type` slot), it falls back to `CodecTypes[codecId]['output']`. Nullability is reattached uniformly. +> **Status — partial.** As shipped, `@prisma-next/sql-contract-ts`'s `FieldOutputType` resolves through `CodecTypesFromDefinition[codecId]['output']` only. Parameterized columns therefore fall back to the codec's base output type in no-emit mode: `vector(1536)` resolves to `number[]` (the codec's base `output`) instead of `Vector<1536>`, and `arktypeJson(schema)` resolves to `unknown` because `@prisma-next/extension-arktype-json/codec-types` declares its `arktype/json@1` entry's `output` as `unknown`. Authors who skip `pnpm emit` get codec-base types, not the parameterized refinements emitted into `contract.d.ts`. Tracked under TML-2357. +> +> The shape described below is the eventual target: the `type: (ctx: CodecInstanceContext) => Codec<…, Js>` slot is already on `ColumnTypeDescriptor` (authoring-time only, never serialized) so `arktypeJson(schema).type` carries the inferred shape at the type level today. What's missing is the `FieldOutputType` resolver wiring that follows `typeRef` and applies the slot to read the `Js` parameter off the resulting codec. + +`@prisma-next/sql-contract-ts`'s `FieldOutputType` will follow `typeRef` through `storage.types`, then synthetically apply `CodecInstanceContext` to the column's `type` slot at the type level and read the `Js` parameter off the resulting `Codec<…, Js>`. For `vector(1536)`, this will produce `Vector<1536>` (literal `N` preserved through curried application). For non-parameterized columns (no `type` slot), it falls back to `CodecTypes[codecId]['output']` (today's behavior for every column). Nullability is reattached uniformly. ### 3. Emit-path rendering @@ -151,7 +155,7 @@ Both problems share a root cause: the type-level facts about a parameterized col ### What works better - **One artifact per codec.** The pack author writes one curried factory function and one descriptor. The descriptor's `renderOutputType` is the only piece the framework owns separately, and only because the emit path runs without the factory in scope. -- **Type fidelity end-to-end.** `vector(1536)` resolves to `Vector<1536>` at authoring time, in the no-emit path, in the emitted `contract.d.ts`, and at runtime decode. `arktypeJson(ProductSchema)` resolves to the schema's inferred output. Future column-scoped stateful codecs (e.g. encryption) resolve to their declared output even though the wire is ciphertext. +- **Type fidelity through the emit path.** `vector(1536)` resolves to `Vector<1536>` at authoring time and in the emitted `contract.d.ts`; `arktypeJson(ProductSchema)` resolves to the schema's inferred output in `contract.d.ts`. The no-emit path still resolves through the codec's base output type today (see § "No-emit type resolution" status block above) — `Vector<1536>` and the arktype schema's inferred shape land in `contract.d.ts` only after `pnpm emit`. Resolver wiring is tracked under TML-2357. Future column-scoped stateful codecs (e.g. encryption) inherit the same staging. - **Non-branching descriptor reads.** `descriptorFor('pg/text@1').traits` and `descriptorFor('pg/vector@1').traits` use the same call shape. Non-parameterized codecs are the degenerate `P = void` case; consumers don't ask "is this codec parameterized" before reading metadata. The four sites that previously read traits via `context.codecs.traitsOf(codecId)` migrated to `context.codecDescriptors.descriptorFor(codecId).traits` without behavior change. - **Framework-components stays library-agnostic.** `paramsSchema: StandardSchemaV1

` keeps arktype confined to the codec authors that opt into it; a future extension that prefers zod or valibot satisfies the same descriptor shape without `framework-components` depending on either library. - **Forward-compat for column-scoped stateful codecs.** Column-scoped encryption and similar codecs author against `(params, ctx)` today using the same surface pack authors already adopted. The contract-load runtime materialization is a documented contract. @@ -160,13 +164,13 @@ Both problems share a root cause: the type-level facts about a parameterized col - **`ColumnTypeDescriptor` grew an authoring-time `type` slot.** The optional `type?: (ctx: CodecInstanceContext) => Codec` field is the price of letting the no-emit resolver read the factory's return type without reaching into the runtime codec registry. The slot is structurally optional, ignored by the IR serializer, and never appears in `contract.json`. - **Per-library extensions own JSON-with-schema.** A schema-typed JSON column is not a postgres-adapter concept; it's a per-library concept. The cost is one more import for users who want a typed JSON column; the benefit is that each library ships a lossless pipeline rather than a generic Standard-Schema-driven shape that's lossy for narrowed types. -- **Encode-side `forCodecId` legacy fallback (carved out, AC-5-deferred).** `ParamRef` carries `codecId` but not `(table, column)` today, so encode-side dispatch consults `contractCodecs.forCodecId(codecId)` instead of `forColumn`. The fallback works for the parameterized codecs shipped at this ADR's landing because their encode is per-instance-stateless w.r.t. params (pgvector formats `[v1,v2,…]` regardless of declared length; arktype-json's encode is `JSON.stringify`). The carve-out is documented at the registry boundary in `relational-core/src/ast/codec-types.ts:101-129` and retires under TML-2357 once `ParamRef.refs` is threaded through column-bound construction sites. +- **Encode-side `forCodecId` legacy fallback (carved out, AC-5-deferred).** `ParamRef` carries `codecId` but not `(table, column)` today, so encode-side dispatch consults `contractCodecs.forCodecId(codecId)` instead of `forColumn`. The fallback only works when the parameterized codec's encode is per-instance-stateless w.r.t. params: pgvector formats `[v1,v2,…]` regardless of declared length; arktype-json's encode is `JSON.stringify` with **no schema check** (validation runs on decode only — see § Per-library JSON extensions). Codec descriptors that are encode-equivalent across params declare `encodeIsParamsIndependent: true`, which tells the runtime registry not to mark the codec id ambiguous when multiple distinct resolved instances share it (so a contract with two `arktypeJson(...)` columns or two `vector(N)` columns of different lengths can encode through `forCodecId` without rejection). The carve-out is documented at the registry boundary in `relational-core/src/ast/codec-types.ts:101-129` and retires under TML-2357 once `ParamRef.refs` is threaded through column-bound construction sites — at which point `encodeIsParamsIndependent` becomes vestigial. - **Heterogeneous-`P` registry boundary.** `descriptorFor(codecId): CodecDescriptor

` is structurally heterogeneous across codec ids — `P` is `void` for `pg/text@1`, `{ length: number }` for `pg/vector@1`, `{ expression; jsonIr }` for `arktype/json@1`, etc. The registry's interface methods cannot be honestly typed at the registry level without `` at the boundary; consumers narrow per codec id at the call site. A typed-dispatch / sealed-visitor refactor would eliminate the suppressions but is not in scope; the registry interface uses `CodecDescriptor` with documented one-line rationale comments at the four production sites. - **Emit-only `Codec` shim for per-library extensions.** The framework emitter consults a single codec-id-keyed `CodecLookup` to resolve `renderOutputType`. Per-library extensions whose codec instance is materialized through the descriptor's factory at runtime can't naturally participate in that lookup at emit time. The arktype-json package ships an emit-only `Codec` instance (`arktypeJsonEmitCodec`) carrying just `renderOutputType`; encode/decode are sentinels that throw if invoked. A future cleanup that routes the emit path through `descriptorFor` retires the shim — tracked under TML-2357. ### Per-library JSON extensions -`@prisma-next/extension-arktype-json` ships `arktypeJson(schema)`. The codec id (`arktype/json@1`) is library-bound, not target-bound. The factory eagerly serializes `schema.expression` (TypeScript-source-like rendering) and `schema.json` (arktype's internal IR) into `typeParams` at the column-author site; the descriptor's factory rehydrates via `ark.schema(typeParams.jsonIr)` and validates internally in `decode`. The emit-path renderer reads `expression` directly so `contract.d.ts` carries the schema's source-like rendering with full fidelity. +`@prisma-next/extension-arktype-json` ships `arktypeJson(schema)`. The codec id (`arktype/json@1`) is library-bound, not target-bound. The factory eagerly serializes `schema.expression` (TypeScript-source-like rendering) and `schema.json` (arktype's internal IR) into `typeParams` at the column-author site; the descriptor's factory rehydrates via `ark.schema(typeParams.jsonIr)`. The codec validates against the schema in `decode` (and in `decodeJson` for JsonValue payloads); `encode` is intentionally schema-independent (`JSON.stringify` only) — see the encode-fallback trade-off above. This matches the JSON-validator philosophy: payloads can come from any source (this writer, a previous schema version, a manual SQL `INSERT`), so validate when reading. The emit-path renderer reads `expression` directly so `contract.d.ts` carries the schema's source-like rendering with full fidelity. The postgres adapter retains only the non-parameterized raw-JSON / raw-JSONB codecs (`pg/json@1`, `pg/jsonb@1`) — schema-typed JSON columns ship from extension packages. Future per-library extensions (`zod/json@1`, `valibot/json@1`) follow the same pattern when each library has a clean serialize / rehydrate story. @@ -188,8 +192,8 @@ The intermediate `CodecParamsDescriptor

` type at the adapter compile-time bou ## Resolves -- **TML-2229.** `vector(1536)`, `arktypeJson(schema)`, and other parameterized columns resolve correctly in the no-emit path AND through the emit path (typeRef columns included, via `EmissionSpi.resolveFieldTypeParams`). -- **The deferred no-emit fix from [ADR 186](ADR%20186%20-%20Codec-dispatched%20type%20rendering.md).** The `renderOutputType` it introduced moves to its long-term home on the descriptor; the no-emit path now resolves through the factory's return type without consulting it. +- **TML-2229 (emit path).** `vector(1536)`, `arktypeJson(schema)`, and other parameterized columns resolve correctly through the emit path (typeRef columns included, via `EmissionSpi.resolveFieldTypeParams`). The no-emit equivalent — `FieldOutputType` walking `typeRef` and reading the column's authoring-time `type` slot — is **not** yet implemented; tracked under TML-2357 alongside the registration-side migration. See the partial-status block in § "No-emit type resolution". +- **The deferred no-emit `renderOutputType` placement from [ADR 186](ADR%20186%20-%20Codec-dispatched%20type%20rendering.md).** The renderer moves to its long-term home on the descriptor. The no-emit *consumption* of that renderer (resolving through the factory's return type at the type level) is the part that ships under TML-2357. ## References diff --git a/docs/architecture docs/subsystems/9. No-Emit Workflow.md b/docs/architecture docs/subsystems/9. No-Emit Workflow.md index abb844c3ad..1be0ea8720 100644 --- a/docs/architecture docs/subsystems/9. No-Emit Workflow.md +++ b/docs/architecture docs/subsystems/9. No-Emit Workflow.md @@ -57,7 +57,9 @@ export const contract = defineContract({ #### Parameterized columns: `vector(1536)`, `arktypeJson(schema)` -For parameterized codecs, the column-author factory returns a `ColumnTypeDescriptor` with an authoring-time `type: (ctx: CodecInstanceContext) => Codec<…, S['infer']>` slot. The no-emit `FieldOutputType` resolver synthetically applies `CodecInstanceContext` at the type level and reads the `Js` parameter off the resulting `Codec<…, Js>`. This preserves literal `N` (e.g. `Vector<1536>`) and inferred schema outputs (e.g. arktype's narrowed types) end-to-end without an emit step. At runtime, the same factory the column author wrote is invoked with parameters round-tripped through the contract IR — there is no parallel runtime function and no opportunity for drift between the no-emit type and the runtime instance. See [ADR 208 — Higher-order codecs for parameterized types](../adrs/ADR%20208%20-%20Higher-order%20codecs%20for%20parameterized%20types.md). +> **Status — partial.** Parameterized columns currently resolve to their codec's base output type in no-emit mode (`vector(1536)` → `number[]`, `arktypeJson(schema)` → `unknown`). The `FieldOutputType` resolver wiring that walks `typeRef` and reads the `Js` parameter off the column's authoring-time `type` slot is not yet implemented; tracked under TML-2357. Until it lands, run `pnpm emit` to obtain the parameterized output types in `contract.d.ts`. + +For parameterized codecs, the column-author factory returns a `ColumnTypeDescriptor` with an authoring-time `type: (ctx: CodecInstanceContext) => Codec<…, S['infer']>` slot. Once the resolver lands, the no-emit `FieldOutputType` will synthetically apply `CodecInstanceContext` at the type level and read the `Js` parameter off the resulting `Codec<…, Js>`. This preserves literal `N` (e.g. `Vector<1536>`) and inferred schema outputs (e.g. arktype's narrowed types) end-to-end without an emit step. At runtime, the same factory the column author wrote is invoked with parameters round-tripped through the contract IR — there is no parallel runtime function and no opportunity for drift between the no-emit type and the runtime instance. See [ADR 208 — Higher-order codecs for parameterized types](../adrs/ADR%20208%20-%20Higher-order%20codecs%20for%20parameterized%20types.md). ```ts const contract = defineContract({ diff --git a/packages/1-framework/1-core/framework-components/src/shared/codec-types.ts b/packages/1-framework/1-core/framework-components/src/shared/codec-types.ts index e21bbd9605..bc48f59459 100644 --- a/packages/1-framework/1-core/framework-components/src/shared/codec-types.ts +++ b/packages/1-framework/1-core/framework-components/src/shared/codec-types.ts @@ -199,6 +199,27 @@ export interface CodecDescriptor

{ * with `ctx` carrying the column set the resulting codec serves. */ readonly factory: (params: P) => (ctx: CodecInstanceContext) => Codec; + /** + * Declares that the codec's `encode` produces structurally equivalent + * wire output regardless of `params` — i.e. picking any resolved + * instance of this codec id at the encode-side `forCodecId` lookup + * yields the same wire payload. Optional; defaults to `false`. + * + * When `true`, the runtime registry does NOT mark the codec id as + * ambiguous when multiple distinct resolved instances share it (e.g. + * two `arktypeJson(...)` columns with different schemas). The encode + * dispatch can pick any of the resolved instances safely; decode + * dispatch still uses `forColumn(table, column)` to get the + * instance-specific schema. + * + * This is the AC-5-deferred bridge for parameterized codecs whose + * encode is intrinsically per-call-stateless w.r.t. params (pgvector + * formats `[v1,v2,...]` regardless of dimension; arktype-json's + * encode is `JSON.stringify` with no schema check). Once + * `ParamRef.refs` plumbing lands (TML-2357 § AC-5), encode will use + * `forColumn` directly and this flag becomes vestigial. + */ + readonly encodeIsParamsIndependent?: boolean; } /** diff --git a/packages/2-sql/5-runtime/src/codecs/decoding.ts b/packages/2-sql/5-runtime/src/codecs/decoding.ts index f6c186bd4b..645448933f 100644 --- a/packages/2-sql/5-runtime/src/codecs/decoding.ts +++ b/packages/2-sql/5-runtime/src/codecs/decoding.ts @@ -250,6 +250,22 @@ async function decodeField( try { decoded = await codec.decode(wireValue, cellCtx); } catch (error) { + // Pass-through stable runtime envelopes: + // + // - `RUNTIME.JSON_SCHEMA_VALIDATION_FAILED`: per-library JSON-with- + // schema codecs (e.g. `arktype/json@1`) validate inside `decode` + // and throw the stable schema-failure code directly. ADR 208 + // promises this code surfaces unchanged. + // - `RUNTIME.DECODE_FAILED`: a codec body that already constructed + // the wrapped envelope itself (carrying its own `details`/`cause` + // contract) must pass through, not be re-wrapped. This matches the + // "no double wrap" guarantee documented on `decodeRow` below. + // + // The post-decode `validateJsonValue` path below has the same + // schema-failure rethrow guard for the legacy validator-registry + // case. + if (isJsonSchemaValidationError(error)) throw error; + if (isRuntimeError(error) && error.code === 'RUNTIME.DECODE_FAILED') throw error; wrapDecodeFailure(error, alias, ref, codec, wireValue); } diff --git a/packages/2-sql/5-runtime/src/codecs/encoding.ts b/packages/2-sql/5-runtime/src/codecs/encoding.ts index dd27d54743..6439fb83a4 100644 --- a/packages/2-sql/5-runtime/src/codecs/encoding.ts +++ b/packages/2-sql/5-runtime/src/codecs/encoding.ts @@ -1,5 +1,6 @@ import { checkAborted, + isRuntimeError, raceAgainstAbort, runtimeError, } from '@prisma-next/framework-components/runtime'; @@ -125,6 +126,27 @@ async function encodeParamValue( try { return await codec.encode(value, ctx); } catch (error) { + // Pass-through stable runtime envelopes: + // + // - `RUNTIME.JSON_SCHEMA_VALIDATION_FAILED`: per-library JSON-with- + // schema codecs validate inside `encode` (ADR 208 § Case J) and + // throw the stable schema-failure code directly. The unified codec + // descriptor model promises this code surfaces unchanged on both + // directions of the wire boundary. + // - `RUNTIME.ENCODE_FAILED`: a codec body that already constructed + // the wrapped envelope itself (carrying its own `details`/`cause` + // contract) must pass through, not be re-wrapped. This matches the + // "no double wrap" guarantee documented on `encodeParams` below. + // + // Anything else flows through `wrapEncodeFailure` to produce a + // canonical `RUNTIME.ENCODE_FAILED` envelope for un-stamped errors. + if ( + isRuntimeError(error) && + (error.code === 'RUNTIME.JSON_SCHEMA_VALIDATION_FAILED' || + error.code === 'RUNTIME.ENCODE_FAILED') + ) { + throw error; + } wrapEncodeFailure(error, metadata, paramIndex, codec.id); } } diff --git a/packages/2-sql/5-runtime/src/sql-context.ts b/packages/2-sql/5-runtime/src/sql-context.ts index bac980874a..1798de1e7f 100644 --- a/packages/2-sql/5-runtime/src/sql-context.ts +++ b/packages/2-sql/5-runtime/src/sql-context.ts @@ -551,7 +551,24 @@ function buildContractCodecRegistry( if (existing === undefined) { byCodecId.set(column.codecId, resolvedCodec); } else if (existing !== resolvedCodec && parameterizedDescriptors.has(column.codecId)) { - ambiguousCodecIds.add(column.codecId); + // Two distinct resolved instances under the same parameterized + // codec id (e.g. `Vector<1024>` and `Vector<1536>`, or two + // `arktypeJson(...)` columns with different schemas). The + // encode-side `forCodecId` fallback can't honor a column- + // specific call site, so by default we mark the codec id + // ambiguous and reject the fallback. + // + // Opt-out: descriptors that declare `encodeIsParamsIndependent` + // produce wire-identical output across all resolved instances + // (pgvector formats `[v1,v2,…]` regardless of dimension; + // arktype-json's encode is `JSON.stringify` with no schema + // check). For those, picking any resolved instance at the + // encode call site is safe — decode dispatch still uses + // `forColumn` to get the instance-specific schema. + const parameterizedDescriptor = parameterizedDescriptors.get(column.codecId); + if (!parameterizedDescriptor?.encodeIsParamsIndependent) { + ambiguousCodecIds.add(column.codecId); + } } } } diff --git a/packages/2-sql/5-runtime/test/contract-codec-registry.test.ts b/packages/2-sql/5-runtime/test/contract-codec-registry.test.ts index 11e46e9b01..e8d3d77fea 100644 --- a/packages/2-sql/5-runtime/test/contract-codec-registry.test.ts +++ b/packages/2-sql/5-runtime/test/contract-codec-registry.test.ts @@ -289,6 +289,113 @@ describe('ContractCodecRegistry', () => { expect(context.contractCodecs.forCodecId('does-not-exist@1')).toBeUndefined(); }); + + // Two parameterized columns with distinct typeParams resolve to two + // distinct codec instances under the same codec id. By default that's + // ambiguous — `forCodecId` rejects rather than silently bind to the + // first registered instance. + it('forCodecId throws RUNTIME.TYPE_PARAMS_INVALID when multiple distinct instances share a parameterized codec id', () => { + const contract = createTestContract({ + Doc: { + small: { + nativeType: 'vector', + codecId: 'pg/vector@1', + nullable: false, + typeParams: { length: 768 }, + }, + large: { + nativeType: 'vector', + codecId: 'pg/vector@1', + nullable: false, + typeParams: { length: 1536 }, + }, + }, + }); + + const context = createTestContext(contract, createStubAdapter(), { + extensionPacks: [createVectorExtensionDescriptor()], + }); + + expect(() => context.contractCodecs.forCodecId('pg/vector@1')).toThrow( + /resolves to multiple parameterized instances/, + ); + }); + + // Descriptors that declare `encodeIsParamsIndependent: true` opt out of + // the ambiguity rejection. Two distinct resolved instances under the + // same codec id are then acceptable for the encode-side `forCodecId` + // fallback because every instance encodes equivalently. Decode still + // uses `forColumn` to get the instance-specific schema. + it('forCodecId tolerates multiple instances when descriptor.encodeIsParamsIndependent is true', () => { + // Build a parameterized descriptor flagged params-independent. + const factory: (params: { length: number }) => (ctx: CodecInstanceContext) => Codec = + (params) => () => + makeVectorCodec({ length: params.length }); + + const paramsIndependentDescriptor: SqlRuntimeExtensionDescriptor<'postgres'> = { + kind: 'extension' as const, + id: 'pgvector-params-independent', + version: '0.0.1', + familyId: 'sql' as const, + targetId: 'postgres' as const, + codecs: () => { + const r = createCodecRegistry(); + r.register(makeVectorCodec()); + return r; + }, + parameterizedCodecs: () => [ + { + codecId: 'pg/vector@1', + traits: ['equality'], + targetTypes: ['vector'], + paramsSchema: { + '~standard': { + version: 1, + vendor: 'test', + validate: (value) => ({ value: value as { length: number } }), + }, + }, + factory, + encodeIsParamsIndependent: true, + }, + ], + create() { + return { familyId: 'sql' as const, targetId: 'postgres' as const }; + }, + }; + + const contract = createTestContract({ + Doc: { + small: { + nativeType: 'vector', + codecId: 'pg/vector@1', + nullable: false, + typeParams: { length: 768 }, + }, + large: { + nativeType: 'vector', + codecId: 'pg/vector@1', + nullable: false, + typeParams: { length: 1536 }, + }, + }, + }); + + const context = createTestContext(contract, createStubAdapter(), { + extensionPacks: [paramsIndependentDescriptor], + }); + + // forCodecId resolves to one of the registered instances rather than + // throwing. Per-column dispatch via forColumn still distinguishes + // them. + const fromCodecId = context.contractCodecs.forCodecId('pg/vector@1'); + expect(fromCodecId).toBeDefined(); + + const small = context.contractCodecs.forColumn('Doc', 'small'); + const large = context.contractCodecs.forColumn('Doc', 'large'); + expect((small as Codec & { meta: { length: number } }).meta.length).toBe(768); + expect((large as Codec & { meta: { length: number } }).meta.length).toBe(1536); + }); }); describe('CodecDescriptorRegistry', () => { diff --git a/packages/2-sql/5-runtime/test/json-schema-validation.test.ts b/packages/2-sql/5-runtime/test/json-schema-validation.test.ts index 6f1a6b5aac..2a6a02cc08 100644 --- a/packages/2-sql/5-runtime/test/json-schema-validation.test.ts +++ b/packages/2-sql/5-runtime/test/json-schema-validation.test.ts @@ -380,6 +380,80 @@ describe('JSON Schema encoding validation', () => { ); expect(result).toBe('{"age":30}'); }); + + // Symmetric to the decode-side guard: per-library JSON-with-schema + // codecs (e.g. `arktype/json@1`) validate inside `encode` and throw + // `RUNTIME.JSON_SCHEMA_VALIDATION_FAILED` directly. The runtime must + // surface that stable code unchanged on the write side — not rewrap + // it as `RUNTIME.ENCODE_FAILED`. + it('preserves RUNTIME.JSON_SCHEMA_VALIDATION_FAILED thrown from codec.encode', async () => { + const inlineValidatingRegistry = createCodecRegistry(); + inlineValidatingRegistry.register( + codec<'inline/json@1', readonly [], string, JsonValue>({ + typeId: 'inline/json@1', + targetTypes: ['jsonb'], + encode: () => { + throw Object.assign(new Error('inline schema rejected payload on write'), { + code: 'RUNTIME.JSON_SCHEMA_VALIDATION_FAILED', + category: 'RUNTIME', + severity: 'error', + details: { codecId: 'inline/json@1' }, + }); + }, + decode: (w: string) => (typeof w === 'string' ? JSON.parse(w) : w) as JsonValue, + }), + ); + + await expect( + encodeParam( + { wrong: 'shape' }, + { codecId: 'inline/json@1' }, + 0, + inlineValidatingRegistry, + {}, + ), + ).rejects.toMatchObject({ + code: 'RUNTIME.JSON_SCHEMA_VALIDATION_FAILED', + }); + }); + + // Codec bodies may also stamp a fully-formed `RUNTIME.ENCODE_FAILED` + // envelope themselves (carrying their own `details`/`cause` contract). + // Re-wrapping that envelope through `wrapEncodeFailure` would drop the + // codec-author-supplied details and produce a misleading double-wrap. + // The doc on `encodeParams` later in this file already promises the + // pass-through; pin it. + it('passes RUNTIME.ENCODE_FAILED envelopes from codec.encode through unchanged', async () => { + const stampingRegistry = createCodecRegistry(); + stampingRegistry.register( + codec<'stamped/encode@1', readonly [], string, JsonValue>({ + typeId: 'stamped/encode@1', + targetTypes: ['jsonb'], + encode: () => { + throw Object.assign(new Error('codec-stamped envelope'), { + code: 'RUNTIME.ENCODE_FAILED', + category: 'RUNTIME', + severity: 'error', + details: { + codecId: 'stamped/encode@1', + codecAuthorContext: 'preserved verbatim', + }, + }); + }, + decode: (w: string) => (typeof w === 'string' ? JSON.parse(w) : w) as JsonValue, + }), + ); + + await expect( + encodeParam({ ignored: 'value' }, { codecId: 'stamped/encode@1' }, 0, stampingRegistry, {}), + ).rejects.toMatchObject({ + code: 'RUNTIME.ENCODE_FAILED', + details: { + codecAuthorContext: 'preserved verbatim', + }, + message: 'codec-stamped envelope', + }); + }); }); // ============================================================================= @@ -551,6 +625,88 @@ describe('JSON Schema decoding validation', () => { }); }); + // ADR 208 § Case J: per-library JSON-with-schema codecs (e.g. + // `arktype/json@1`) validate inside `decode` and throw + // `RUNTIME.JSON_SCHEMA_VALIDATION_FAILED` directly. The runtime must + // surface that stable code unchanged — without the rethrow guard in + // `decodeField`, `wrapDecodeFailure` would re-wrap the error as + // `RUNTIME.DECODE_FAILED` and the documented error contract would + // break for the inline-validation path. + it('preserves RUNTIME.JSON_SCHEMA_VALIDATION_FAILED thrown from codec.decode', async () => { + const inlineValidatingRegistry = createCodecRegistry(); + inlineValidatingRegistry.register( + codec<'inline/json@1', readonly [], string, JsonValue>({ + typeId: 'inline/json@1', + targetTypes: ['jsonb'], + encode: (v: JsonValue) => JSON.stringify(v), + decode: () => { + throw Object.assign(new Error('inline schema rejected payload'), { + code: 'RUNTIME.JSON_SCHEMA_VALIDATION_FAILED', + category: 'RUNTIME', + severity: 'error', + details: { codecId: 'inline/json@1' }, + }); + }, + }), + ); + + const plan = createTestPlan({ + ast: projectionAst([ + ProjectionItem.of('metadata', ColumnRef.of('user', 'metadata'), 'inline/json@1'), + ]), + }); + + const row = { metadata: '{"anything":"goes"}' }; + await expect( + decodeRow(row, plan, inlineValidatingRegistry, undefined, {}), + ).rejects.toMatchObject({ + code: 'RUNTIME.JSON_SCHEMA_VALIDATION_FAILED', + }); + }); + + // Symmetric to the encode-side guard: a codec body may stamp a fully- + // formed `RUNTIME.DECODE_FAILED` envelope itself, with its own + // `details`/`cause` contract. Re-wrapping through `wrapDecodeFailure` + // would drop the codec-author-supplied details and produce a + // misleading double-wrap. The doc on `decodeRow` already promises this + // pass-through; pin it. + it('passes RUNTIME.DECODE_FAILED envelopes from codec.decode through unchanged', async () => { + const stampingRegistry = createCodecRegistry(); + stampingRegistry.register( + codec<'stamped/decode@1', readonly [], string, JsonValue>({ + typeId: 'stamped/decode@1', + targetTypes: ['jsonb'], + encode: (v: JsonValue) => JSON.stringify(v), + decode: () => { + throw Object.assign(new Error('codec-stamped decode envelope'), { + code: 'RUNTIME.DECODE_FAILED', + category: 'RUNTIME', + severity: 'error', + details: { + codecId: 'stamped/decode@1', + codecAuthorContext: 'preserved verbatim', + }, + }); + }, + }), + ); + + const plan = createTestPlan({ + ast: projectionAst([ + ProjectionItem.of('metadata', ColumnRef.of('user', 'metadata'), 'stamped/decode@1'), + ]), + }); + + const row = { metadata: '{"anything":"goes"}' }; + await expect(decodeRow(row, plan, stampingRegistry, undefined, {})).rejects.toMatchObject({ + code: 'RUNTIME.DECODE_FAILED', + details: { + codecAuthorContext: 'preserved verbatim', + }, + message: 'codec-stamped decode envelope', + }); + }); + // --------------------------------------------------------------------------- // Codec-authored error.message redaction — DEFERRED follow-up. // diff --git a/packages/3-extensions/arktype-json/README.md b/packages/3-extensions/arktype-json/README.md index 56ef1ba171..14dde9e8cb 100644 --- a/packages/3-extensions/arktype-json/README.md +++ b/packages/3-extensions/arktype-json/README.md @@ -55,6 +55,12 @@ In the emitted `contract.d.ts`, `Product.spec` resolves to `{ name: string; price: number; description?: string }` — the schema's expression renders directly into the field type. +> **No-emit caveat.** Today, importing the TS contract directly without +> running `pnpm emit` resolves `Product.spec` to `unknown` (the codec's +> base `output` type). The schema's inferred shape only flows into the +> field type after emit. Parameterized no-emit resolution is tracked +> under TML-2357 — see [ADR 208 § No-emit type resolution](../../../docs/architecture%20docs/adrs/ADR%20208%20-%20Higher-order%20codecs%20for%20parameterized%20types.md). + ## Pack registration Add the runtime descriptor to your runtime stack and the control descriptor @@ -78,6 +84,14 @@ const stack = createSqlExecutionStack({ }); ``` +## Compatibility + +Codec stability depends on a round-trip invariant: `ark.schema(typeParams.jsonIr).expression === typeParams.expression`. The emit-path renderer reads `expression` directly, so a contract emitted against arktype `X` and rehydrated against arktype `Y` produces correct types only as long as that invariant holds across `X→Y`. + +The package's `arktype` dependency is pinned to a tilde range (`~2.1.29`) — patch upgrades are accepted, minor and major upgrades are not. Bumping the range without a coordinated re-emit of every contract using `arktype/json@1` risks emit-path output going stale relative to the rehydrated runtime schema. Consumers who upgrade `arktype` outside this range should re-run `pnpm emit` and verify `contract.d.ts` matches expectations. + +The runtime enforces the invariant defensively: the codec's factory runs at execution-context construction time (typically when `runtime.connect()` is called), and throws `RUNTIME.TYPE_PARAMS_INVALID` if the rehydrated schema's `expression` doesn't match the serialized one. So a stale-but-shape-valid `contract.json` fails fast at startup rather than rendering wrong types in user code. The error message points at re-running `pnpm emit`. + ## Notes - The codec is library-bound (`arktype/json@1`), not target-bound. Other diff --git a/packages/3-extensions/arktype-json/package.json b/packages/3-extensions/arktype-json/package.json index 87d11a4665..fc7c6ac6a7 100644 --- a/packages/3-extensions/arktype-json/package.json +++ b/packages/3-extensions/arktype-json/package.json @@ -20,7 +20,7 @@ "@prisma-next/framework-components": "workspace:*", "@prisma-next/sql-relational-core": "workspace:*", "@prisma-next/sql-runtime": "workspace:*", - "arktype": "^2.1.29" + "arktype": "~2.1.29" }, "devDependencies": { "@prisma-next/sql-contract": "workspace:*", diff --git a/packages/3-extensions/arktype-json/src/core/arktype-json-codec.ts b/packages/3-extensions/arktype-json/src/core/arktype-json-codec.ts index d4ba961950..c4d9d0938f 100644 --- a/packages/3-extensions/arktype-json/src/core/arktype-json-codec.ts +++ b/packages/3-extensions/arktype-json/src/core/arktype-json-codec.ts @@ -75,12 +75,14 @@ export type ArktypeJsonTypeParams = { /** * Codec instance returned by `arktypeJson(schema)(ctx)` and by * `arktypeJsonCodec.factory(typeParams)(ctx)`. The `TInferred` slot - * carries the arktype schema's inferred output type. + * carries the arktype schema's inferred output type. The wire type is + * `string | JsonValue` to accommodate Postgres drivers that return + * `jsonb` cells as already-parsed JS values. */ export type ArktypeJsonCodec = Codec< typeof ARKTYPE_JSON_CODEC_ID, readonly ['equality'], - string, + string | JsonValue, TInferred >; @@ -149,51 +151,91 @@ function arktypeJsonCodecForSchema( } // Derive both `encode` (wire string) and `encodeJson` (JsonValue) - // outputs from the same `JSON.stringify` → `JSON.parse` round-trip, - // then validate the normalized payload through the schema. Without - // this normalization, a non-JSON-safe runtime value (e.g. a class - // instance, a function field on a narrowed type) could slip through - // `encodeJson` unchanged while `encode` silently dropped or - // transformed it — producing wire payloads the codec's own decode - // path would later reject. The serialize/parse round-trip also - // produces the JSON-safe shape required by the contract IR's - // `JsonValue` surface, so `encodeJson` no longer needs a blind cast. + // outputs from the same `JSON.stringify` → `JSON.parse` round-trip. + // Encode is intentionally **schema-independent**: the schema check + // runs only on `decode` (and `decodeJson`), matching the JSON-validator + // philosophy that wire payloads may originate from any source — this + // writer, a previous schema version, a manual SQL `INSERT`, a sibling + // service. Validation belongs at the read boundary. + // + // Beyond the philosophical fit, encode-side schema validation would + // make the resolved codec parameter-dependent. Today's encode dispatch + // (`encodeParamValue` → `contractCodecs.forCodecId(codecId)`) carries + // only the codec id, so two `arktypeJson(...)` columns with distinct + // schemas would resolve through `forCodecId` and hit the registry's + // ambiguity rejection (`RUNTIME.TYPE_PARAMS_INVALID`). Until + // `ParamRef.refs` plumbing lands (TML-2357 § AC-5), encode must stay + // params-independent or arktype-json contracts with multiple typed + // JSON columns become unusable. function serializeToJsonSafe(value: TInferred): { wire: string; json: JsonValue } { // `JSON.stringify` returns `string | undefined` — `undefined` // happens when the input is `undefined` itself or contains only - // unserializable values (functions, symbols). Reject explicitly so - // the caller sees the schema-failure code rather than a downstream - // `JSON.parse(undefined)` SyntaxError. + // unserializable values (functions, symbols). Reject explicitly + // so the caller sees a clear `RUNTIME.ENCODE_FAILED` envelope + // (the runtime wraps this throw via `wrapEncodeFailure`) rather + // than a downstream `JSON.parse(undefined)` SyntaxError. const wire: string | undefined = JSON.stringify(value); if (typeof wire !== 'string') { - throw runtimeError( - 'RUNTIME.JSON_SCHEMA_VALIDATION_FAILED', + throw new Error( `arktype-json value is not representable as JSON (codecId: ${ARKTYPE_JSON_CODEC_ID})`, - { codecId: ARKTYPE_JSON_CODEC_ID }, ); } const json = JSON.parse(wire) as JsonValue; - // Validate the normalized payload — the round-trip strips - // class-prototype shape and arktype-narrowed fields, and the - // schema must still accept the result. Run validation and discard - // its return value (we keep `json` as the JsonValue, not the - // schema's `inferOut` which already matches `TInferred`). - validateSchema(json); return { wire, json }; } return (_ctx) => - codec({ + codec({ typeId: ARKTYPE_JSON_CODEC_ID, targetTypes: [ARKTYPE_JSON_NATIVE_TYPE], traits: ['equality'] as const, encode: (value: TInferred): string => serializeToJsonSafe(value).wire, - decode: (wire: string): TInferred => validateSchema(JSON.parse(wire)), + decode: (wire: string | JsonValue): TInferred => validateSchema(parseWireValue(wire)), encodeJson: (value: TInferred): JsonValue => serializeToJsonSafe(value).json, decodeJson: (json: JsonValue) => validateSchema(json), }) as ArktypeJsonCodec; } +/** + * Normalize a JSONB wire value to its decoded shape regardless of the + * driver's pre-parsing behavior. + * + * Postgres `jsonb` columns can come back from the driver as either a raw + * JSON-text string (e.g. `'{"name":"alice"}'`) or an already-parsed JS + * value (object/array/primitive). The `pg` driver pre-parses by default; + * other drivers vary. The codec doesn't know which mode it's in, and the + * two are indistinguishable at the type level (`JsonValue` includes + * `string`, so a pre-parsed JSONB string primitive `"alice"` arrives as + * the bare JS string `alice`). + * + * Resolution: when the wire is a string, attempt `JSON.parse`. If that + * succeeds, the wire was raw JSON text — use the parsed value. If it + * throws (the string isn't valid JSON syntax), the wire was a pre-parsed + * JSON string primitive — pass it through unchanged. Non-string wires + * (objects, arrays, numbers, booleans, `null`) are already pre-parsed + * and pass through directly. + * + * Edge case: a stored JSON string `"hello"` arrives pre-parsed as `hello`. + * `JSON.parse("hello")` throws, so the original string is returned — + * correct. A stored JSON string `"123"` arrives pre-parsed as `123`. But + * `JSON.parse("123")` returns the number `123` — wrong if the stored + * value was the string `"123"`. This collision is intrinsic to the + * driver-mode ambiguity and matches what `pgJsonbCodec` does today; if + * lossless distinction is required, callers should disable driver + * pre-parsing or pin to a driver mode that reports the wire shape. + */ +function parseWireValue(wire: string | JsonValue): JsonValue { + if (typeof wire !== 'string') return wire; + try { + return JSON.parse(wire) as JsonValue; + } catch (_error) { + // Wire isn't valid JSON syntax — driver pre-parsed a JSON string + // primitive (e.g. `"alice"` → `alice`). Pass it through and let the + // schema decide if a string is acceptable. + return wire; + } +} + // ── Column-author surface ──────────────────────────────────────────────── /** @@ -319,34 +361,78 @@ function renderArktypeJsonOutputTypeFromUnknownParams( } /** - * Emit-only `Codec` instance for `arktype/json@1`. Threaded through the - * pack-meta's `codecInstances` array so the emitter's `CodecLookup` can - * find a `renderOutputType` for the codec id (the emitter consults the - * codec-id-keyed `CodecLookup` at the framework boundary; the unified - * descriptor's `renderOutputType` is the long-term home for the renderer - * but the emit-path glue still routes through `CodecLookup`). + * Metadata `Codec` instance for `arktype/json@1`. Threaded through the + * pack-meta's `codecInstances` array (control plane) AND the runtime + * descriptor's `types.codecTypes.codecInstances` (runtime plane) so two + * codec-id-keyed lookups resolve: + * + * - The framework emitter's `CodecLookup` reads `renderOutputType` to + * stamp `Vector<…>` / arktype-schema-shaped types into `contract.d.ts`. + * - The Postgres SQL renderer's `extractCodecLookup` reads + * `meta.db.sql.postgres.nativeType` to render `$N::jsonb` casts at + * parameter binding sites (`json` / `jsonb` are excluded from + * `POSTGRES_INFERRABLE_NATIVE_TYPES`, so the cast is not optional). * - * All conversion methods are sentinels that throw if invoked — runtime - * materialization always goes through `arktypeJsonCodec.factory`'s - * curried `(params) => (ctx) => Codec`, never through this instance. - * `encodeJson`/`decodeJson` throw alongside `encode`/`decode` so a - * mistaken contract-load that resolved to this stub fails fast at the - * JSON boundary instead of silently returning unvalidated payloads. A - * future cleanup could route the emit path through the descriptor map - * directly and retire this shim. + * The declared type is the framework `Codec` plus a structural intersection + * carrying `meta`. We intentionally do NOT widen to the SQL `Codec` + * extension here: `meta` is the only SQL-leaning slot the stub needs, and + * coupling the family-agnostic descriptor's `codecInstances` slot to a + * specific family layer would block reuse for any future non-SQL family + * (e.g. a Mongo arktype variant) that wants the same renderer-lookup + * shape. The SQL renderer reads `meta` structurally via its own + * `as SqlCodec | undefined` cast at the lookup boundary, so the field + * is consumed without requiring the source declaration to participate + * in the SQL family's type hierarchy. + * + * Conversion methods (`encode` / `decode` / `encodeJson` / `decodeJson`) + * are sentinels that throw if invoked — runtime dispatch goes through + * `arktypeJsonCodec.factory(params)(ctx)` via the unified descriptor map, + * never through this instance. The sentinels exist so a mistaken + * contract-load that resolved to this stub fails fast at the JSON + * boundary instead of silently returning unvalidated payloads. A future + * cleanup that routes the emit path through `descriptorFor` and the + * runtime cast lookup through the descriptor map retires this shim + * (TML-2357). */ const ARKTYPE_JSON_RUNTIME_DISPATCH_ERROR = - 'arktype-json codec instances must be materialized via the descriptor factory; this is an emit-only stub'; + 'arktype-json codec instances must be materialized via the descriptor factory; this is a metadata-only stub'; + +/** + * Structural shape of the SQL renderer's `meta.db.sql..nativeType` + * read. Co-located with the codec rather than imported from + * `sql-relational-core` so this package's family-agnostic codec stub + * doesn't depend on the SQL family's type hierarchy. + */ +type SqlNativeTypeMeta = { + readonly db: { + readonly sql: { + readonly postgres: { + readonly nativeType: 'jsonb'; + }; + }; + }; +}; export const arktypeJsonEmitCodec: Codec< typeof ARKTYPE_JSON_CODEC_ID, readonly ['equality'], string, unknown -> = { +> & { + readonly meta: SqlNativeTypeMeta; +} = { id: ARKTYPE_JSON_CODEC_ID, targetTypes: [ARKTYPE_JSON_NATIVE_TYPE], traits: ['equality'] as const, + meta: { + db: { + sql: { + postgres: { + nativeType: 'jsonb', + }, + }, + }, + }, encode: () => Promise.reject(new Error(ARKTYPE_JSON_RUNTIME_DISPATCH_ERROR)), decode: () => Promise.reject(new Error(ARKTYPE_JSON_RUNTIME_DISPATCH_ERROR)), encodeJson: () => { @@ -379,24 +465,37 @@ export const arktypeJsonCodec: CodecDescriptor = { targetTypes: [ARKTYPE_JSON_NATIVE_TYPE] as const, paramsSchema: arktypeJsonParamsSchema, renderOutputType: renderArktypeJsonOutputType, + // arktype-json's `encode` is `JSON.stringify` with no schema check + // (validation runs on decode only). Distinct schemas produce distinct + // resolved codec instances under the same `arktype/json@1` codec id, + // but every instance encodes equivalently — so the runtime registry's + // ambiguity guard would be over-conservative without this opt-in. + // Allows contracts to carry multiple `arktypeJson(...)` columns + // pre-TML-2357 (before `ParamRef.refs` lands). + encodeIsParamsIndependent: true, factory: (params) => { const schema = rehydrateSchema(params.jsonIr); - /* c8 ignore start — defensive parity check; not exercised by typical contracts */ - // The rehydrated schema's `expression` should match the serialized - // one; diverging means contract.json was hand-edited out from under - // the emit-path renderer. Surface as a soft warning at materialization - // time so the caller knows their emit output may not match the - // runtime schema. The runtime keeps using the schema rehydrated from - // `jsonIr` — that's the lossless source — so the worst case is an - // emit-vs-runtime divergence at a single column, not a runtime - // failure. + // The rehydrated schema's `expression` must match the serialized one. + // A mismatch means either contract.json was hand-edited or the + // installed arktype version's IR-to-expression rendering diverged from + // the version that produced contract.json — in both cases the emitted + // `contract.d.ts` is no longer faithful to the runtime schema. Fail at + // contract-load rather than warn: a runtime warning fires after the + // wrong types have already shipped to consumers, and silent drift is + // exactly what the round-trip-stability invariant is supposed to + // prevent. See § Compatibility in the package README. const rehydratedExpression = (schema as { readonly expression?: unknown }).expression; if (typeof rehydratedExpression === 'string' && rehydratedExpression !== params.expression) { - console.warn( - `[arktype-json] typeParams.expression (${params.expression}) does not match rehydrated schema expression (${rehydratedExpression}); contract.json may be stale relative to the runtime schema.`, + throw runtimeError( + 'RUNTIME.TYPE_PARAMS_INVALID', + `arktype-json: typeParams.expression (${params.expression}) does not match the rehydrated schema's expression (${rehydratedExpression}). The contract was likely emitted against a different arktype version or hand-edited; re-run \`pnpm emit\` to regenerate.`, + { + codecId: ARKTYPE_JSON_CODEC_ID, + serializedExpression: params.expression, + rehydratedExpression, + }, ); } - /* c8 ignore stop */ return arktypeJsonCodecForSchema(schema); }, }; diff --git a/packages/3-extensions/arktype-json/src/exports/runtime.ts b/packages/3-extensions/arktype-json/src/exports/runtime.ts index a8d571b818..2832f89681 100644 --- a/packages/3-extensions/arktype-json/src/exports/runtime.ts +++ b/packages/3-extensions/arktype-json/src/exports/runtime.ts @@ -15,7 +15,7 @@ import { createCodecRegistry } from '@prisma-next/sql-relational-core/ast'; import type { SqlRuntimeExtensionDescriptor } from '@prisma-next/sql-runtime'; -import { arktypeJsonCodec } from '../core/arktype-json-codec'; +import { arktypeJsonCodec, arktypeJsonEmitCodec } from '../core/arktype-json-codec'; import { arktypeJsonPackMeta } from '../core/pack-meta'; function createArktypeJsonCodecRegistry() { @@ -30,6 +30,22 @@ export const arktypeJsonRuntimeDescriptor: SqlRuntimeExtensionDescriptor<'postgr version: arktypeJsonPackMeta.version, familyId: 'sql' as const, targetId: 'postgres' as const, + // Mirror `arktypeJsonPackMeta.types.codecTypes.codecInstances` here so + // that the runtime-plane `extractCodecLookup` (called by the postgres + // adapter at `create()` time, see + // `packages/3-targets/6-adapters/postgres/src/exports/runtime.ts`) + // discovers `arktype/json@1`. Without this, `renderTypedParam` throws + // "assembled codec lookup has no entry" the first time a query touches + // an arktypeJson column. The codec carries `meta.db.sql.postgres.nativeType` + // = `'jsonb'` so the renderer emits `$N::jsonb` (jsonb is excluded from + // `POSTGRES_INFERRABLE_NATIVE_TYPES`, so the cast is required). + // Encode/decode dispatch goes through the unified descriptor map's + // `factory(params)(ctx)`, never through this metadata stub. + types: { + codecTypes: { + codecInstances: [arktypeJsonEmitCodec], + }, + }, codecs: createArktypeJsonCodecRegistry, parameterizedCodecs: () => [arktypeJsonCodec], create() { diff --git a/packages/3-extensions/arktype-json/test/arktype-json-codec.test-d.ts b/packages/3-extensions/arktype-json/test/arktype-json-codec.test-d.ts index 2663911f17..e8332af9bb 100644 --- a/packages/3-extensions/arktype-json/test/arktype-json-codec.test-d.ts +++ b/packages/3-extensions/arktype-json/test/arktype-json-codec.test-d.ts @@ -1,3 +1,4 @@ +import type { JsonValue } from '@prisma-next/contract/types'; import type { Codec, CodecDescriptor, @@ -24,7 +25,12 @@ export type _Arktype_TakesCtx = Assert[0], Codec type ProductResolved = ReturnType; type ProductJs = - ProductResolved extends Codec<'arktype/json@1', readonly ['equality'], string, infer Js> + ProductResolved extends Codec< + 'arktype/json@1', + readonly ['equality'], + string | JsonValue, + infer Js + > ? Js : never; @@ -43,7 +49,7 @@ const audit = arktypeJson(auditSchema); type AuditType = typeof audit.type; type AuditResolved = ReturnType; type AuditJs = - AuditResolved extends Codec<'arktype/json@1', readonly ['equality'], string, infer Js> + AuditResolved extends Codec<'arktype/json@1', readonly ['equality'], string | JsonValue, infer Js> ? Js : never; type AuditInfer = typeof auditSchema.infer; diff --git a/packages/3-extensions/arktype-json/test/arktype-json-codec.test.ts b/packages/3-extensions/arktype-json/test/arktype-json-codec.test.ts index 13617fcd88..406c15dd0a 100644 --- a/packages/3-extensions/arktype-json/test/arktype-json-codec.test.ts +++ b/packages/3-extensions/arktype-json/test/arktype-json-codec.test.ts @@ -117,6 +117,23 @@ describe('arktypeJson encode/decode (Promise-lifted async surface)', () => { description: 'A widget', }); }); + + // Postgres `pg` driver returns `jsonb` cells as already-parsed JS values + // by default. The codec must accept both wire shapes — string and + // pre-parsed JsonValue — to match `pgJsonbCodec`'s tolerance. + it('decode accepts already-parsed jsonb values from the driver', async () => { + const codec = arktypeJson(productSchema).type(SYNTH_CTX); + const wire = { name: 'Widget', price: 10 }; + expect(await codec.decode(wire, CALL_CTX)).toEqual(wire); + }); + + it('decode validates pre-parsed payloads against the schema', async () => { + const codec = arktypeJson(productSchema).type(SYNTH_CTX); + const wire = { name: 'Widget' }; + await expect(codec.decode(wire, CALL_CTX)).rejects.toThrow( + /JSON_SCHEMA_VALIDATION_FAILED|price/, + ); + }); }); describe('arktypeJson roundtrip', () => { @@ -136,13 +153,12 @@ describe('arktypeJson roundtrip', () => { expect(restored).toEqual(original); }); - // The codec derives both `encode` (wire string) and `encodeJson` - // (JsonValue) outputs from the same `JSON.stringify` → `JSON.parse` - // round-trip, then runs the schema on the normalized payload. Without - // this unification, `encodeJson` would emit non-JSON-safe values - // unchanged while `encode` silently dropped or transformed them, - // producing wire payloads the codec'\''s own decode path would later - // reject. The next two tests pin both halves of that contract. + // Encode is intentionally schema-independent (see ADR 208 § Case J and + // the encode-fallback trade-off). Both `encode` and `encodeJson` derive + // their output from the same `JSON.stringify` → `JSON.parse` round-trip + // so a non-plain-object runtime value (class instance, etc.) still + // produces the same JSON-safe shape on both surfaces. Schema validation + // runs on read. it('encode and encodeJson agree on the normalized payload', async () => { const codec = arktypeJson(productSchema).type(SYNTH_CTX); const original = { name: 'Widget', price: 10, description: 'desc' }; @@ -151,14 +167,13 @@ describe('arktypeJson roundtrip', () => { expect(wire).toBe(JSON.stringify(json)); }); - it('encode rejects non-JSON-safe runtime values via the shared validator', async () => { + it('encode normalizes class-instance inputs to a JSON-safe wire shape', async () => { + // Class instance with prototype-only methods. `JSON.stringify` strips + // those, so the normalized wire payload is `{ name, price }`. The + // codec doesn't validate against the schema on encode (that's a + // decode-side concern under the JSON-validator philosophy), but the + // round-trip must still produce a JSON-safe value. const codec = arktypeJson(productSchema).type(SYNTH_CTX); - // Class instance with extra prototype-only methods. `JSON.stringify` - // strips those so the wire payload normalizes to `{ name, price }`, - // but the schema must still accept the normalized shape — this case - // does. The check ensures the unification path runs without - // throwing for legitimate JSON-safe payloads even when the runtime - // type isn'\''t a plain object. class Widget { constructor( public name: string, @@ -174,21 +189,50 @@ describe('arktypeJson roundtrip', () => { }); it('encode rejects values that are not representable as JSON', async () => { - // A schema whose inferred type accepts a function field would never - // be authored intentionally — but the type system can'\''t catch that - // for `unknown`-typed schemas at runtime. Cast through the typed - // surface to model the case where a runtime value is structurally - // unserializable; both encode paths must reject. + // `JSON.stringify(undefined)` returns `undefined`, not a string. The + // codec rejects with a plain `Error` rather than a runtime envelope: + // the runtime wraps it as `RUNTIME.ENCODE_FAILED` at the dispatch + // boundary. The codec doesn't pre-stamp a stable code here because + // the underlying problem is a JS-level serialization failure, not a + // schema rejection — and stamping a schema code would be misleading + // now that schema validation no longer runs on encode. const anySchema = type('object'); const codec = arktypeJson(anySchema).type(SYNTH_CTX); - // `JSON.stringify(undefined)` returns `undefined`, not a string; the - // serializeToJsonSafe guard rejects this with a clear schema-failure - // code rather than a downstream `JSON.parse(undefined)` SyntaxError. await expect(codec.encode(undefined as never, CALL_CTX)).rejects.toThrow( - /not representable as JSON|JSON_SCHEMA_VALIDATION_FAILED/, + /not representable as JSON/, ); - expect(() => codec.encodeJson(undefined as never)).toThrow( - /not representable as JSON|JSON_SCHEMA_VALIDATION_FAILED/, + expect(() => codec.encodeJson(undefined as never)).toThrow(/not representable as JSON/); + }); + + // F02 — Postgres drivers can return `jsonb` cells as already-parsed JS + // values, including primitive types. For an arktype `type('string')` + // schema, a stored `"alice"` arrives as the bare JS string `alice` — + // a SyntaxError on `JSON.parse(wire)`. The decode path tries + // `JSON.parse` first (covering raw-JSON-text drivers and pre-parsed + // objects/arrays/numbers/bools/null) and falls back to the original + // string when parsing throws (covering pre-parsed JSON string + // primitives). + it('decode accepts pre-parsed JSON string primitives for string-schema columns', async () => { + const stringSchema = type('string'); + const codec = arktypeJson(stringSchema).type(SYNTH_CTX); + expect(await codec.decode('alice', CALL_CTX)).toBe('alice'); + }); + + it('decode still parses raw-JSON-text wire for string-schema columns', async () => { + const stringSchema = type('string'); + const codec = arktypeJson(stringSchema).type(SYNTH_CTX); + // Raw JSON text wire — `JSON.parse('"bob"')` returns the string `bob`. + expect(await codec.decode('"bob"', CALL_CTX)).toBe('bob'); + }); + + it('decode rejects pre-parsed primitives that violate the schema', async () => { + const stringSchema = type('string'); + const codec = arktypeJson(stringSchema).type(SYNTH_CTX); + // Pre-parsed number primitive — fails string-schema validation. The + // wire is `42` (not a string), goes straight through `parseWireValue` + // (non-string branch), and the schema rejects it. + await expect(codec.decode(42, CALL_CTX)).rejects.toThrow( + /JSON_SCHEMA_VALIDATION_FAILED|string/, ); }); }); @@ -260,16 +304,38 @@ describe('arktypeJsonCodec descriptor', () => { })(SYNTH_CTX), ).toThrow(/Failed to rehydrate arktype schema/); }); + + // The factory rehydrates the schema from `jsonIr` and asserts that + // the schema's `expression` round-trips. A divergence means either + // the contract was hand-edited or an arktype version mismatch caused + // the rendering to drift — in both cases the emitted `contract.d.ts` + // is no longer faithful to the runtime schema, and the previous + // `console.warn` was too quiet (the bad types had already shipped + // by the time the warning fired). + it('factory throws RUNTIME.TYPE_PARAMS_INVALID when typeParams.expression diverges from rehydrated schema', () => { + const descriptor = arktypeJson(productSchema); + expect(() => + arktypeJsonCodec.factory({ + ...descriptor.typeParams, + expression: 'an obviously stale expression', + })(SYNTH_CTX), + ).toThrow(/typeParams\.expression .* does not match/); + }); + + it('factory accepts matching typeParams.expression without complaint', () => { + const descriptor = arktypeJson(productSchema); + expect(() => arktypeJsonCodec.factory(descriptor.typeParams)(SYNTH_CTX)).not.toThrow(); + }); }); describe('serialize/rehydrate roundtrip', () => { it("rehydrated schema's behavior matches the source", async () => { // The rehydration round-trip is the load-bearing guarantee for the // emit-vs-runtime parity check: the rehydrated schema validates the - // same payloads as the source (semantic identity, even if the - // expression diverges across arktype versions). The descriptor's - // factory carries a defensive console.warn for expression - // divergence; we only assert on the validation side here. + // same payloads as the source. The factory enforces expression + // round-trip stability by throwing if `expression` diverges; this + // test exercises the happy path where serialized and rehydrated + // expressions match. const descriptor = arktypeJson(productSchema); const reCodec = arktypeJsonCodec.factory(descriptor.typeParams)(SYNTH_CTX); const sourceCodec = descriptor.type(SYNTH_CTX); @@ -330,11 +396,12 @@ describe('decodeJson schema enforcement', () => { }); }); -describe('arktypeJsonEmitCodec (emit-only shim)', () => { - // The emit-only codec carries `renderOutputType` so the framework - // emitter's `CodecLookup` can resolve the column's TS type at emit - // time. encode/decode are sentinels that throw if invoked — runtime - // materialization always goes through the descriptor's factory. +describe('arktypeJsonEmitCodec (metadata-only shim)', () => { + // The metadata-only codec carries `renderOutputType` (read by the + // framework emitter's `CodecLookup`) and `meta.db.sql.postgres.nativeType` + // (read by the postgres SQL renderer's cast policy). encode/decode are + // sentinels that throw if invoked — runtime materialization always + // goes through the descriptor's factory. it('exposes the codec id and native type', () => { expect(arktypeJsonEmitCodec.id).toBe(ARKTYPE_JSON_CODEC_ID); expect(arktypeJsonEmitCodec.targetTypes).toEqual([ARKTYPE_JSON_NATIVE_TYPE]); @@ -361,16 +428,16 @@ describe('arktypeJsonEmitCodec (emit-only shim)', () => { }); it('encode/decode reject because runtime materialization goes through the descriptor', async () => { - await expect(arktypeJsonEmitCodec.encode('value', CALL_CTX)).rejects.toThrow(/emit-only/); - await expect(arktypeJsonEmitCodec.decode('wire', CALL_CTX)).rejects.toThrow(/emit-only/); + await expect(arktypeJsonEmitCodec.encode('value', CALL_CTX)).rejects.toThrow(/metadata-only/); + await expect(arktypeJsonEmitCodec.decode('wire', CALL_CTX)).rejects.toThrow(/metadata-only/); }); it('encodeJson/decodeJson throw because runtime materialization goes through the descriptor', () => { // Mirrors `encode`/`decode`: a contract-load path that resolved to - // this emit-only stub must fail fast at the JSON boundary instead + // this metadata-only stub must fail fast at the JSON boundary instead // of silently returning unvalidated payloads. - expect(() => arktypeJsonEmitCodec.encodeJson('payload')).toThrow(/emit-only/); - expect(() => arktypeJsonEmitCodec.decodeJson({ a: 1 })).toThrow(/emit-only/); + expect(() => arktypeJsonEmitCodec.encodeJson('payload')).toThrow(/metadata-only/); + expect(() => arktypeJsonEmitCodec.decodeJson({ a: 1 })).toThrow(/metadata-only/); }); }); diff --git a/packages/3-extensions/arktype-json/test/extension-descriptors.test.ts b/packages/3-extensions/arktype-json/test/extension-descriptors.test.ts index 07ea531e64..e9cd084db5 100644 --- a/packages/3-extensions/arktype-json/test/extension-descriptors.test.ts +++ b/packages/3-extensions/arktype-json/test/extension-descriptors.test.ts @@ -1,5 +1,10 @@ +import { extractCodecLookup } from '@prisma-next/framework-components/control'; import { describe, expect, it } from 'vitest'; -import { ARKTYPE_JSON_CODEC_ID, arktypeJsonCodec } from '../src/core/arktype-json-codec'; +import { + ARKTYPE_JSON_CODEC_ID, + arktypeJsonCodec, + arktypeJsonEmitCodec, +} from '../src/core/arktype-json-codec'; import { arktypeJsonExtensionDescriptor } from '../src/exports/control'; import { arktypeJsonRuntimeDescriptor } from '../src/exports/runtime'; @@ -31,6 +36,31 @@ describe('arktypeJsonRuntimeDescriptor', () => { expect(instance.familyId).toBe('sql'); expect(instance.targetId).toBe('postgres'); }); + + // The runtime descriptor must surface `arktypeJsonEmitCodec` through + // `types.codecTypes.codecInstances` so the postgres adapter's + // `extractCodecLookup` can resolve `arktype/json@1` for cast-policy + // metadata. Without this, `renderTypedParam` throws "assembled codec + // lookup has no entry" the first time a query touches an arktypeJson + // column. Regression guard for the bug shipped in #402. + it('exposes arktype/json@1 metadata through types.codecTypes.codecInstances', () => { + const codecInstances = arktypeJsonRuntimeDescriptor.types?.codecTypes?.codecInstances; + expect(codecInstances).toContain(arktypeJsonEmitCodec); + }); + + it('extractCodecLookup over the runtime descriptor resolves arktype/json@1', () => { + const lookup = extractCodecLookup([arktypeJsonRuntimeDescriptor]); + const resolved = lookup.get(ARKTYPE_JSON_CODEC_ID); + expect(resolved).toBe(arktypeJsonEmitCodec); + }); + + // jsonb is excluded from POSTGRES_INFERRABLE_NATIVE_TYPES, so the + // SQL renderer's cast policy depends on this metadata field to emit + // `$N::jsonb` at parameter sites. Pin the meta shape so a future + // refactor doesn't silently drop it. + it('arktypeJsonEmitCodec carries postgres jsonb native-type metadata', () => { + expect(arktypeJsonEmitCodec.meta?.db?.sql?.postgres?.nativeType).toBe('jsonb'); + }); }); describe('arktypeJsonExtensionDescriptor (control)', () => { diff --git a/packages/3-extensions/pgvector/src/exports/runtime.ts b/packages/3-extensions/pgvector/src/exports/runtime.ts index 5d125bb098..579e0acacf 100644 --- a/packages/3-extensions/pgvector/src/exports/runtime.ts +++ b/packages/3-extensions/pgvector/src/exports/runtime.ts @@ -46,6 +46,17 @@ const parameterizedCodecDescriptors = [ paramsSchema: vectorParamsSchema, renderOutputType: (params: { readonly length: number }) => `Vector<${params.length}>`, factory: vectorFactory, + // pgvector's wire format `[v1,v2,...]` is dimension-independent — every + // resolved instance encodes equivalently regardless of declared length. + // Today the factory dodges the registry's reference-based ambiguity + // check by returning `sharedVectorCodec` for every params, so two + // `vector(N)` columns of different lengths happen to share one codec + // instance. Declare the invariant explicitly so a future refactor that + // closes over `params` (e.g. capping wire length to declared dimension + // — see comment above `vectorFactory`) keeps multi-column contracts + // working without surprises. Becomes vestigial when AC-5 (TML-2357) + // threads `ParamRef.refs` through column-bound construction sites. + encodeIsParamsIndependent: true, }, ] as const satisfies ReadonlyArray< RuntimeParameterizedCodecDescriptor<{ readonly length: number }> diff --git a/packages/3-extensions/pgvector/test/manifest.test.ts b/packages/3-extensions/pgvector/test/manifest.test.ts index caf71e6413..73aadb2f12 100644 --- a/packages/3-extensions/pgvector/test/manifest.test.ts +++ b/packages/3-extensions/pgvector/test/manifest.test.ts @@ -1,6 +1,8 @@ import { timeouts } from '@prisma-next/test-utils'; import { describe, expect, it } from 'vitest'; +import { VECTOR_CODEC_ID } from '../src/core/constants'; import { pgvectorExtensionDescriptor } from '../src/exports/control'; +import pgvectorRuntimeDescriptor from '../src/exports/runtime'; describe('pgvector descriptor', () => { it('has correct metadata', () => { @@ -49,4 +51,20 @@ describe('pgvector descriptor', () => { }, timeouts.typeScriptCompilation, ); + + // The pgvector parameterized descriptor declares + // `encodeIsParamsIndependent: true` so the runtime registry tolerates + // multiple `vector(N)` columns of different lengths sharing the same + // codec id without rejecting `forCodecId('pg/vector@1')` as ambiguous. + // The wire format `[v1,v2,...]` is dimension-independent — every + // resolved instance encodes equivalently. Pinning the flag here keeps + // the invariant load-bearing if anyone refactors `vectorFactory` to + // close over `params` (which would otherwise produce reference-distinct + // instances and trip the registry's ambiguity guard). + it('parameterized vector descriptor declares encodeIsParamsIndependent', () => { + const parameterizedCodecs = pgvectorRuntimeDescriptor.parameterizedCodecs(); + const vectorDescriptor = parameterizedCodecs.find((d) => d.codecId === VECTOR_CODEC_ID); + expect(vectorDescriptor).toBeDefined(); + expect(vectorDescriptor?.encodeIsParamsIndependent).toBe(true); + }); }); diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 5b7ff34c39..06e82fa8da 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -2285,7 +2285,7 @@ importers: specifier: workspace:* version: link:../../2-sql/5-runtime arktype: - specifier: ^2.1.29 + specifier: ~2.1.29 version: 2.1.29 devDependencies: '@prisma-next/sql-contract': diff --git a/test/e2e/framework/test/arktype-json.test.ts b/test/e2e/framework/test/arktype-json.test.ts new file mode 100644 index 0000000000..1247d133d7 --- /dev/null +++ b/test/e2e/framework/test/arktype-json.test.ts @@ -0,0 +1,86 @@ +import { readFile } from 'node:fs/promises'; +import { dirname, resolve } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import arktypeJson from '@prisma-next/extension-arktype-json/runtime'; +import type { Vector } from '@prisma-next/extension-pgvector/codec-types'; +import pgvector from '@prisma-next/extension-pgvector/runtime'; +import postgres from '@prisma-next/postgres/runtime'; +import type { Runtime } from '@prisma-next/sql-runtime'; +import { timeouts, withDevDatabase } from '@prisma-next/test-utils'; +import { describe, expect, it } from 'vitest'; +import type { Contract } from './fixtures/generated/contract.d'; +import { runDbInit } from './utils'; + +// Round-trip coverage for the `Embedding.profile` arktype-json column. +// The fixture has carried this column since the original arktype-json +// landing, but no test wrote or read it — so the runtime correctness +// gaps (missing cast-lookup registration, hardcoded `JSON.parse(wire)` +// in decode) surfaced only in production. This file exercises the full +// pipeline: +// +// create → encode (JSON.stringify) → SQL renderer +// ($N::jsonb cast lookup) → driver write → driver read (pre-parsed +// JSON for jsonb on `pg`) → decode (schema validate) → ORM result. + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const contractJsonPath = resolve(__dirname, 'fixtures/generated/contract.json'); + +async function loadContractJson(): Promise { + const content = await readFile(contractJsonPath, 'utf-8'); + return JSON.parse(content); +} + +async function withPostgresClient( + callback: (db: ReturnType>) => Promise, +): Promise { + const contractJson = await loadContractJson(); + await withDevDatabase(async ({ connectionString }) => { + await runDbInit({ connectionString, contractJsonPath }); + const db = postgres({ + contractJson, + url: connectionString, + extensions: [pgvector, arktypeJson], + }); + let runtime: Runtime | undefined; + try { + runtime = await db.connect(); + await db.orm.User.first(); + await callback(db); + } finally { + await runtime?.close(); + } + }); +} + +function buildEmbedding(seed: number): Vector<1536> { + return Array.from({ length: 1536 }, (_, i) => (i + seed) / 1536) as Vector<1536>; +} + +describe('arktype-json column round-trip', { timeout: timeouts.spinUpPpgDev }, () => { + it('writes and reads back a typed JSON value through the ORM', async () => { + await withPostgresClient(async (db) => { + const created = await db.orm.Embedding.create({ + embedding: buildEmbedding(0), + profile: { name: 'alice', age: 30 }, + }); + + const found = await db.orm.Embedding.where((e) => e.id.eq(created.id)).first(); + expect(found).not.toBeNull(); + expect(found!.profile).toEqual({ name: 'alice', age: 30 }); + }); + }); +}); + +// Decode-side schema rejection is exercised at the unit level in +// `packages/3-extensions/arktype-json/test/arktype-json-codec.test.ts` +// (decode rejects pre-parsed payloads that violate the schema) and at +// the runtime level in +// `packages/2-sql/5-runtime/test/json-schema-validation.test.ts` (the +// runtime preserves `RUNTIME.JSON_SCHEMA_VALIDATION_FAILED` thrown from +// codec.decode without rewrapping). Reproducing it through the ORM is +// awkward because `create` returns a decoded row via `RETURNING`, so a +// schema-violating write surfaces the decode error from `create` itself +// rather than a subsequent read — there's no single ORM call that +// cleanly demonstrates the validate-on-read contract end-to-end without +// raw-SQL bypass infrastructure that this fixture doesn't expose. Unit +// coverage is sufficient.