-
-
Notifications
You must be signed in to change notification settings - Fork 344
New Security Considerations #1618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jdesrosiers
wants to merge
4
commits into
json-schema-org:main
Choose a base branch
from
jdesrosiers:sec-cons
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+123
−26
Open
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2029,32 +2029,129 @@ SHOULD use the terms defined by this document to do so. | |
|
||
## Security Considerations {#security} | ||
|
||
Both schemas and instances are JSON values. As such, all security considerations | ||
defined in [RFC 8259][rfc8259] apply. | ||
|
||
Instances and schemas are both frequently written by untrusted third parties, to | ||
be deployed on public Internet servers. Implementations should take care that | ||
the parsing and evaluating against schemas does not consume excessive system | ||
resources. Implementations MUST NOT fall into an infinite loop. | ||
|
||
A malicious party could cause an implementation to repeatedly collect a copy of | ||
a very large value as an annotation. Implementations SHOULD guard against | ||
excessive consumption of system resources in such a scenario. | ||
|
||
Servers MUST ensure that malicious parties cannot change the functionality of | ||
existing schemas by uploading a schema with a pre-existing or very similar | ||
`$id`. | ||
|
||
Individual JSON Schema extensions are liable to also have their own security | ||
considerations. Consult the respective specifications for more information. | ||
|
||
Schema authors should take care with `$comment` contents, as a malicious | ||
implementation can display them to end-users in violation of a spec, or fail to | ||
strip them if such behavior is expected. | ||
|
||
A malicious schema author could place executable code or other dangerous | ||
material within a `$comment`. Implementations MUST NOT parse or otherwise take | ||
action based on `$comment` contents. | ||
While schemas and instances are not always represented as JSON text, they are | ||
defined in terms of the JSON data model. As such, the security considerations | ||
defined in [RFC 8259][rfc8259] may still apply in environments where text-based | ||
representations are used, particularly those considerations related to parsing, | ||
number precision, and structural limitations. | ||
|
||
Schemas and instances are frequently authored by untrusted parties. | ||
Implementations that accept or evaluate such inputs may be exposed to several | ||
classes of attack, particularly denial-of-service (DoS) by means of resource | ||
exhaustion. | ||
|
||
### Nested `anyOf`/`oneOf` | ||
|
||
One risk for resource exhaustion in JSON Schema arises from the nested use of | ||
`anyOf` and `oneOf`. While a single combinator keyword with multiple subschemas | ||
is typically manageable, nesting them causes the number of evaluation paths to | ||
grow exponentially. | ||
|
||
For example, a `oneOf` with 5 subschemas, each containing another `oneOf` with 5 | ||
options, results in 25 evaluation paths. Adding a third level increases this to | ||
125, and so on. Attackers can exploit this by crafting schemas that force | ||
validators to explore a large number of branches. | ||
|
||
This evaluation explosion is particularly dangerous when each path involves | ||
expensive work such as collecting large annotations or evaluating complex | ||
regular expressions. These effects multiply across paths and can result in | ||
excessive CPU or memory consumption, leading to denial-of-service. | ||
|
||
Implementations that evaluate untrusted schema are encouraged to take steps to | ||
mitigate these threats with measures such as bounding combinator keyword depth | ||
and breadth, limiting memory used for annotation collection, and guarding | ||
against resource-intensive validations such as pathological regexes. | ||
|
||
### Dynamic References | ||
|
||
The paper ["The Complexity of JSON Schema: Undecidable, Expensive, Yet | ||
Tractable" (Caroni et al., 2024)](https://doi.org/10.1145/3632891) has shown | ||
that validation in the presence of dynamic references is PSPACE-complete. The | ||
paper describes a method for replacing dynamic references with static ones, but | ||
doing so can cause the size of the schema to grow exponentially. Implementations | ||
should be aware of this risk and may wish to implement the method described in | ||
the paper or impose limits on dynamic reference resolution. | ||
|
||
### Infinite Loops and Cycles | ||
|
||
Infinite loops can occur when evaluating schemas that produce cycles during | ||
reference resolution. These cycles may involve multiple schemas. Not all | ||
recursive schemas create loops, but implementations are advised to detect these | ||
cycles and terminate evaluation when they are encountered. | ||
|
||
### Schema Identity and Collisions | ||
|
||
Schemas may declare an `$id` to identify themselves or have embedded schemas | ||
that declare an `$id`. An attacker may attempt to register a schema with an | ||
`$id` that collides with a previously registered schema, or that differs only by | ||
case, encoding, or other URI normalization quirks. Such collisions could result | ||
in overwriting or shadowing of trusted schemas. | ||
|
||
Implementations should consider rejecting schemas that have identifiers | ||
(including embedded schema identifiers) that conflict with registered schemas | ||
and should apply consistent URI normalization and comparison logic to detect and | ||
prevent conflicts. | ||
|
||
### External Schema Resolution | ||
|
||
JSON Schema implementations are expected to resolve external references using a | ||
local registry. Although the specification allows for dynamic retrieval | ||
(`https:` to fetch schemas over HTTP, or `file:` to read schemas from disk), | ||
this behavior is discouraged unless it's intrinsic to the use case, such as with | ||
JSON Hyper-Schema. | ||
|
||
Resolving schemas dynamically introduces several security concerns, each of | ||
which can be mitigated by limiting or controlling resolution behavior. A tightly | ||
scoped schema resolution policy significantly reduces the attack surface, | ||
especially when validating untrusted data. | ||
|
||
Implementations are advised to disable dynamic retrieval by default and limit | ||
external schema resolution to the local registry unless dynamic retrieval is | ||
explicitly enabled. If enabled, they should consider limiting the number of | ||
dynamic retrievals a validation can perform and defining timeouts on dynamic | ||
retrievals to reduce the risk of resource exhaustion. | ||
|
||
#### HTTP(S) Specific Threats | ||
|
||
Allowing schema references to resolve over HTTP or HTTPS introduces several | ||
threats: | ||
|
||
* **Denial of Service (DoS)**: Validation may hang or become slow if a | ||
referenced schema URL is slow to respond or never returns. | ||
* **Server-Side Request Forgery (SSRF)**: Malicious schemas can reference | ||
internal-only services using hostnames like localhost or private IPs. | ||
Implementations are advised to restrict HTTP schema retrieval to a | ||
configurable allowlist of trusted domains. | ||
* **Lack of Integrity Guarantees**: Retrieved schemas may be altered in transit | ||
or change between validations. If network retrieval is allowed, | ||
implementations are advised to only allow retrieval over HTTPS unless | ||
specifically configured to allow unsecured transport. | ||
|
||
#### File System Specific Threats | ||
|
||
Allowing resolution from the local filesystem (`file:` URIs) raises different | ||
issues: | ||
|
||
* **Information Disclosure**: Malicious schemas may access sensitive files on | ||
the system. Implementations should consider restricting filesystem access to | ||
a specific schema directory tree. | ||
* **Cross-Context Access**: A schema fetched from HTTP may try to reference a | ||
schema on the filesystem. Implementations are advised to allow resolving | ||
`file:` references only when the referencing schema was itself loaded from the | ||
file system, similar to same-origin policies in web browsers. | ||
* **Exposing Internal Paths**: Schemas that use `file:` URIs may reveal | ||
host-specific filesystem details in two ways: through the `$id` itself or | ||
through schema locations in validation output. Implementations are advised to | ||
reject `$id` values that use the `file:` scheme. If `file:` URIs are permitted | ||
internally, implementations are advised to sanitize them (for example, by | ||
converting them to relative URIs) to avoid exposing host filesystem structure | ||
to users. | ||
|
||
### Vocabulary-Specific Risks | ||
|
||
Third-party JSON Schema vocabularies may introduce additional risks. | ||
Implementers are advised to consult the specifications of any extensions they | ||
support and take into account their security considerations as well. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should update this to use "extension" language since we've removed "vocabulary". |
||
|
||
## IANA Considerations | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we not want to allow that? I think we should upgrade this to a MUST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (not in front of a computer currently) that normalization is a requirement of the sections on reference resolution.
That said, I wouldn't be opposed to being more strict around preventing overwrites.
Would a MUST-level requirement be appropriate here, or would it fit more in one of the main body sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all ...
Right. The Security Considerations section should be an analysis of the security considerations both mitigated and not mitigated by requirements as defined in the specification. It should not make normative statements, but can reference parts of the spec that do.
There are two "shoulds" in this sentence, so I'm not sure which one (or both) are being referred to.
A reasonable alternative could be to ignore the schema and use the registered one instead (perhaps with a warning). A scenario where I think this might make sense is with a bundled schema that includes a common schema because it might be used in a variety of implementations and doesn't know if the schema will be registered or not. If the implementation doesn't have that schema registered, then it's available in the bundle. If the implementation does have it registered, it should use the registered version. I think there's a variety of different situations and use cases where not rejecting the schema could make sense. I'd like implementations to be able to provide users with the ability to choose how this should be handled. But, suggesting that they should be rejected is, I think, I good "secure by default" behavior.
Looking into this, I just noticed the following from RFC 3987
JSON Schema describes our use of IRIs as identity tokens, not locators, which means implementations should limit themselves to Simple String Comparison. That's not what our spec currently says. It says to use the normalization procedure defined in 5.3, which isn't quite correct either. We'd need to be more specific about which comparisons to use. For example, we wouldn't require scheme-based or protocol-based normalization because it would be too onerous a requirement for implementers to implement scheme-based requirements for any scheme they might encounter.
From a security perspective, it doesn't matter what normalization is performed, only that it's consistent. Maybe that's the update we should make for now and discuss what comparison/normalization should be required in the spec in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of a clear usecase for this.
It may be that implementations have chosen poor architecture and only allow a single instance of the implementation.
Say I have a JSON Schema, and I want to evaluate the outcome if I change a part of one of the referenced schemas? If it was already registered with an ID, and I want to override that, I should be able to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that comes down to an implementation option to allow it. The default behavior should be to reject an overwrite attempt.