Skip to content

v3.2: Guidance on searching and evaluating schemas #4743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: v3.2-dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 108 additions & 2 deletions src/oas.md
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ Using a `contentEncoding` of `base64url` ensures that URL encoding (as required

The `contentMediaType` keyword is redundant if the media type is already set:

* as the key for a [MediaType Object](#media-type-object)
* as the key for a [Media Type Object](#media-type-object)
* in the `contentType` field of an [Encoding Object](#encoding-object)

If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored.
Expand Down Expand Up @@ -1257,6 +1257,8 @@ See [Working With Examples](#working-with-examples) for further guidance regardi

This object MAY be extended with [Specification Extensions](#specification-extensions).

Note that correlating Encoding Objects with Schema Objects may require [schema searches](#searching-schemas) for keywords such as `properties`, `prefixItems`, and `items`.

See also the [Media Type Registry](#media-type-registry).

##### Complete vs Streaming Content
Expand Down Expand Up @@ -1639,7 +1641,7 @@ These fields MAY be used either with or without the RFC6570-style serialization

| Field Name | Type | Description |
| ---- | :----: | ---- |
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. |
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the type (determined by a [schema search](#searching-schemas)) as shown in the table below. |
| <a name="encoding-headers"></a>headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. |

This object MAY be extended with [Specification Extensions](#specification-extensions).
Expand Down Expand Up @@ -2599,6 +2601,10 @@ Note that JSON Schema Draft 2020-12 does not require an `x-` prefix for extensio
The [`format` keyword (when using default format-annotation vocabulary)](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.2.1) and the [`contentMediaType`, `contentEncoding`, and `contentSchema` keywords](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-8.2) define constraints on the data, but are treated as annotations instead of being validated directly.
Extended validation is one way that these constraints MAY be enforced.

In addition to extended validation, annotations are the most effective way to determine whether these keywords impact the type and structure of the fully parsed data.
For example, formats such as `int64` can be applied to JSON strings, as JSON numers have limitations that make large integers non-portable.
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitatioons this imposes.

###### Validating `readOnly` and `writeOnly`

The `readOnly` and `writeOnly` keywords are annotations, as JSON Schema is not aware of how the data it is validating is being used.
Expand All @@ -2611,6 +2617,106 @@ Even when read-only fields are not required, stripping them is burdensome for cl

Note that the behavior of `readOnly` in particular differs from that specified by version 3.0 of this specification.

##### Working with Schemas

In addition to schema evaluation, which encompasses both validation and annotation, some OAS features require inspecting schemas in other ways.

###### Preparing Data for Schema Evaluation

When the data source is a JSON document, preparing the data is trivial as parsing JSON produces a suitable data structure.
Some other media types, as well as URL components and header values, lack sufficient type information to parse directly to suitable data types.

Consider this URL-encoded form:

```uri
foo=42&bar=42
```

As URL query parameters are strings, this would naturally parse to something equivalent to the following JSON:

```json
{
"foo": "42",
"bar": "42"
}
```

But consider this [Media Type Object](#media-type-object) for the form:

```yaml
application/x-www-form-urlencoded:
schema:
type: object
properties:
foo:
type: string
bar:
type: integer
```

From the `schema` field, we can tell that the correct data structure would actually be equivalent to:

```json
{
"foo": "42",
"bar": 42
}
```

In order to prepare the correct data structure for evaluation in such cases, implementations MUST perform a [schema search](#searching-schemas) for the `type` keyword.

###### Applying Further Type Information

The `format` keyword provides more fine-grained type information, and can even change the underlying data type for the purposes of the application.
For example, if `foo` had the schema `{"type": "string", "format": "int64")`, the data structure used for validation would still be the same, but the application will need to convert the string `"42"` to the 64-bit integer `42`.
Similarly, the `content*` keywords can indicate further structure within a string.

Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements.

Note that parsing string contents based on `contentMediaType` carries the same security risks as parsing HTTP message bodies based on `Content-Type`; see [Handling External Resources](#handling-external-resources) for further information.

###### Schema Evaluation and Binary Data

As noted under [Working with Binary Data](#working-with-binary-data), Schema Objects for binary documents do not use any standard JSON Schema assertions, as the only ones that could apply (`const` and `enum`) would require embedding raw binary into JSON which is not possible.

However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations:

1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords).
2. Perform [schema searches](#searching-schemas) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data.

Implementations MUST document which strategy or strategies they use, as well as any known limitations.

##### Searching Schemas

Several OAS features require searching Schema Objects for keywords indicating the data type and/or structure.
Even if the requirement is given in terms of schema keywords, if the data is in a form [suitable for schema evaluation](#preparing-data-for-schema-evaluation) and the necessary information (including type) can be determined by inspecting the data (and possibly also annotations such as `format`), implementations MUST support doing so as this is effective regardless of how schemas are structured.

If this is not possible, the schemas MUST be searched to see if the information can be determined without performing evaluation.
As schema organization can become very complex, implementations are not expected to handle every possible schema layout.
However, given a known starting point schema (usually the value of the nearest `schema` field), implementations MUST search the following for the relevant keywords (e.g. `type`, `format`, `contentMediaType`, `properties`, `prefixItems`, `items`, etc.):

* The starting point schema itself
* Any schema reachable from there solely through `$ref` and/or `allOf`

These schemas are guaranteed to be applied to any instance.

In some cases, such as correlating [Encoding Objects](#encoding-object) with Schema Objects using fields in a [Media Type Object](#media-type-object), it is be necessary to first find a keyword such as `properties`, and then treat its subschema(s) as starting point schemas for further searches.

Implementations MAY analyze subschemas of other keywords such as `oneOf` or `dependentSchemas`, or examine possible `$dynamicRef` targets, and MUST document the extent and nature of any such additional support.

###### Handling Multiple Types

When a `type` keyword with multiple values (e.g. `type: ["number", "null"]`) is found, implementations MUST attempt to use the types as follows, ignoring any types not present in the `type` list:

1. Determine if the data can be parsed as whichever of `null`, `number`, `object`, or `array` are present in the `type` list, treating `integer` as `number` for this step.
2. If the data can be parsed as a number, and `integer` is in the `type` list, check to see if the value is a mathematical integer, regardless of its textual representation.
3. If the data has not been parsed successfully and `string` is in the type list, parse it as a string.

This process is sufficient to produce data that can be validated by JSON Schema.
If `format` or `content*` are needed for further parsing, they can be checked in the same way as `type`, or as annotations from the schema evaluation process.

Implementations that choose to search conditional keywords such as `anyOf` SHOULD use this same precedence to resolve multiple possible `type` values found through such searches.

##### Data Modeling Techniques

###### Composition and Inheritance (Polymorphism)
Expand Down