Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 123 additions & 26 deletions specs/jsonschema-core.md
Original file line number Diff line number Diff line change
Expand Up @@ -2029,32 +2029,129 @@ SHOULD use the terms defined by this document to do so.

## Security Considerations {#security}

Both schemas and instances are JSON values. As such, all security considerations
defined in [RFC 8259][rfc8259] apply.

Instances and schemas are both frequently written by untrusted third parties, to
be deployed on public Internet servers. Implementations should take care that
the parsing and evaluating against schemas does not consume excessive system
resources. Implementations MUST NOT fall into an infinite loop.

A malicious party could cause an implementation to repeatedly collect a copy of
a very large value as an annotation. Implementations SHOULD guard against
excessive consumption of system resources in such a scenario.

Servers MUST ensure that malicious parties cannot change the functionality of
existing schemas by uploading a schema with a pre-existing or very similar
`$id`.

Individual JSON Schema extensions are liable to also have their own security
considerations. Consult the respective specifications for more information.

Schema authors should take care with `$comment` contents, as a malicious
implementation can display them to end-users in violation of a spec, or fail to
strip them if such behavior is expected.

A malicious schema author could place executable code or other dangerous
material within a `$comment`. Implementations MUST NOT parse or otherwise take
action based on `$comment` contents.
While schemas and instances are not always represented as JSON text, they are
defined in terms of the JSON data model. As such, the security considerations
defined in [RFC 8259][rfc8259] may still apply in environments where text-based
representations are used, particularly those considerations related to parsing,
number precision, and structural limitations.

Schemas and instances are frequently authored by untrusted parties.
Implementations that accept or evaluate such inputs may be exposed to several
classes of attack, particularly denial-of-service (DoS) by means of resource
exhaustion.

### Nested `anyOf`/`oneOf`

One risk for resource exhaustion in JSON Schema arises from the nested use of
`anyOf` and `oneOf`. While a single combinator keyword with multiple subschemas
is typically manageable, nesting them causes the number of evaluation paths to
grow exponentially.

For example, a `oneOf` with 5 subschemas, each containing another `oneOf` with 5
options, results in 25 evaluation paths. Adding a third level increases this to
125, and so on. Attackers can exploit this by crafting schemas that force
validators to explore a large number of branches.

This evaluation explosion is particularly dangerous when each path involves
expensive work such as collecting large annotations or evaluating complex
regular expressions. These effects multiply across paths and can result in
excessive CPU or memory consumption, leading to denial-of-service.

Implementations that evaluate untrusted schema are encouraged to take steps to
mitigate these threats with measures such as bounding combinator keyword depth
and breadth, limiting memory used for annotation collection, and guarding
against resource-intensive validations such as pathological regexes.

### Dynamic References

The paper ["The Complexity of JSON Schema: Undecidable, Expensive, Yet
Tractable" (Caroni et al., 2024)](https://doi.org/10.1145/3632891) has shown
that validation in the presence of dynamic references is PSPACE-complete. The
paper describes a method for replacing dynamic references with static ones, but
doing so can cause the size of the schema to grow exponentially. Implementations
should be aware of this risk and may wish to implement the method described in
the paper or impose limits on dynamic reference resolution.

### Infinite Loops and Cycles

Infinite loops can occur when evaluating schemas that produce cycles during
reference resolution. These cycles may involve multiple schemas. Not all
recursive schemas create loops, but implementations are advised to detect these
cycles and terminate evaluation when they are encountered.

### Schema Identity and Collisions

Schemas may declare an `$id` to identify themselves or have embedded schemas
that declare an `$id`. An attacker may attempt to register a schema with an
`$id` that collides with a previously registered schema, or that differs only by
case, encoding, or other URI normalization quirks. Such collisions could result
in overwriting or shadowing of trusted schemas.

Implementations should consider rejecting schemas that have identifiers
(including embedded schema identifiers) that conflict with registered schemas
and should apply consistent URI normalization and comparison logic to detect and
prevent conflicts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we not want to allow that? I think we should upgrade this to a MUST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (not in front of a computer currently) that normalization is a requirement of the sections on reference resolution.

That said, I wouldn't be opposed to being more strict around preventing overwrites.

Would a MUST-level requirement be appropriate here, or would it fit more in one of the main body sections?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all ...

Would a MUST-level requirement be appropriate here, or would it fit more in one of the main body sections?

Right. The Security Considerations section should be an analysis of the security considerations both mitigated and not mitigated by requirements as defined in the specification. It should not make normative statements, but can reference parts of the spec that do.

When would we not want to allow that? I think we should upgrade this to a MUST.

There are two "shoulds" in this sentence, so I'm not sure which one (or both) are being referred to.

Implementations should consider rejecting schemas that have identifiers (including embedded schema identifiers) that conflict with registered schemas

A reasonable alternative could be to ignore the schema and use the registered one instead (perhaps with a warning). A scenario where I think this might make sense is with a bundled schema that includes a common schema because it might be used in a variety of implementations and doesn't know if the schema will be registered or not. If the implementation doesn't have that schema registered, then it's available in the bundle. If the implementation does have it registered, it should use the registered version. I think there's a variety of different situations and use cases where not rejecting the schema could make sense. I'd like implementations to be able to provide users with the ability to choose how this should be handled. But, suggesting that they should be rejected is, I think, I good "secure by default" behavior.

and should apply consistent URI normalization and comparison logic to detect and
prevent conflicts.

Looking into this, I just noticed the following from RFC 3987

Applications using IRIs as identity tokens with no relationship to a protocol MUST use the Simple String Comparison (see section 5.3.1). All other applications MUST select one of the comparison practices from the Comparison Ladder (see section 5.3

JSON Schema describes our use of IRIs as identity tokens, not locators, which means implementations should limit themselves to Simple String Comparison. That's not what our spec currently says. It says to use the normalization procedure defined in 5.3, which isn't quite correct either. We'd need to be more specific about which comparisons to use. For example, we wouldn't require scheme-based or protocol-based normalization because it would be too onerous a requirement for implementers to implement scheme-based requirements for any scheme they might encounter.

From a security perspective, it doesn't matter what normalization is performed, only that it's consistent. Maybe that's the update we should make for now and discuss what comparison/normalization should be required in the spec in a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of a clear usecase for this.
It may be that implementations have chosen poor architecture and only allow a single instance of the implementation.
Say I have a JSON Schema, and I want to evaluate the outcome if I change a part of one of the referenced schemas? If it was already registered with an ID, and I want to override that, I should be able to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that comes down to an implementation option to allow it. The default behavior should be to reject an overwrite attempt.


### External Schema Resolution

JSON Schema implementations are expected to resolve external references using a
local registry. Although the specification allows for dynamic retrieval
(`https:` to fetch schemas over HTTP, or `file:` to read schemas from disk),
this behavior is discouraged unless it's intrinsic to the use case, such as with
JSON Hyper-Schema.

Resolving schemas dynamically introduces several security concerns, each of
which can be mitigated by limiting or controlling resolution behavior. A tightly
scoped schema resolution policy significantly reduces the attack surface,
especially when validating untrusted data.

Implementations are advised to disable dynamic retrieval by default and limit
external schema resolution to the local registry unless dynamic retrieval is
explicitly enabled. If enabled, they should consider limiting the number of
dynamic retrievals a validation can perform and defining timeouts on dynamic
retrievals to reduce the risk of resource exhaustion.

#### HTTP(S) Specific Threats

Allowing schema references to resolve over HTTP or HTTPS introduces several
threats:

* **Denial of Service (DoS)**: Validation may hang or become slow if a
referenced schema URL is slow to respond or never returns.
* **Server-Side Request Forgery (SSRF)**: Malicious schemas can reference
internal-only services using hostnames like localhost or private IPs.
Implementations are advised to restrict HTTP schema retrieval to a
configurable allowlist of trusted domains.
* **Lack of Integrity Guarantees**: Retrieved schemas may be altered in transit
or change between validations. If network retrieval is allowed,
implementations are advised to only allow retrieval over HTTPS unless
specifically configured to allow unsecured transport.

#### File System Specific Threats

Allowing resolution from the local filesystem (`file:` URIs) raises different
issues:

* **Information Disclosure**: Malicious schemas may access sensitive files on
the system. Implementations should consider restricting filesystem access to
a specific schema directory tree.
* **Cross-Context Access**: A schema fetched from HTTP may try to reference a
schema on the filesystem. Implementations are advised to allow resolving
`file:` references only when the referencing schema was itself loaded from the
file system, similar to same-origin policies in web browsers.
* **Exposing Internal Paths**: Schemas that use `file:` URIs may reveal
host-specific filesystem details in two ways: through the `$id` itself or
through schema locations in validation output. Implementations are advised to
reject `$id` values that use the `file:` scheme. If `file:` URIs are permitted
internally, implementations are advised to sanitize them (for example, by
converting them to relative URIs) to avoid exposing host filesystem structure
to users.

### Vocabulary-Specific Risks

Third-party JSON Schema vocabularies may introduce additional risks.
Implementers are advised to consult the specifications of any extensions they
support and take into account their security considerations as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this to use "extension" language since we've removed "vocabulary".


## IANA Considerations

Expand Down