Description
Brief Description
JSON Schema is a rich language for expressing constraints on JSON data. If we strictly consider JSON Schema validation (rather than any other use of JSON Schema), in many cases there are multiple ways to express the same constraints. For example, the schema:
{
"oneOf": [
{"const": "foo"},
{"const": "bar"}
]
}
will have the same validation outcome on all instances as the schema:
{"enum": ["foo", "bar"]}
One might say that this second schema is in some way "better" than the first one in some way that could be made precise.
The same is true for the schemas {"required": ["foo"]}
and {"title": "My Schema", "required": ["foo"]}
, and one might say the first one is "better" than the second for the purpose of validation.
We can define two schemas to be "equivalent" if they have this property that any instance is valid under one if and only if it is valid under the other, and if we have two equivalent schemas S
and S'
we might wish to define an algorithm for transforming these schemas into a form which is "canonical" or "normal" such as above.
There are existing attempts to do this for various use cases, but no central place where a self-contained set of normalization rules are written down and a self-contained tool exists to perform the procedure. Let's try and write a simple one!
Expected Outcomes
- Investigate the existing implementations of normalization in the wild. There are at least two known ones, one being here.
- Define a set of normalization rules, with configurability for cases where there are multple reasonable canonical forms
- Define a set of test cases for schemas which are equivalent under these rules, and for the target canonical form for each set of schemas
- Write a Python library which performs the normalization and emits the normalized schema
- Empirically test our normalization procedure by running normalized schemas through Bowtie and comparing whether a given implementation returns the same results
Skills Required
- An existing understanding of JSON Schema's keywords, which can be used to think about areas which might create possible "denormalization" (e.g. keywords which when used together overlap)
- Familiarity writing Python, and ideally using JSON Schema from Python
- Experience testing pieces of software by writing test cases, here likely in the form of writing JSON Schema + instance examples
- Careful diligence in reading and understanding the existing procedures used (in the link above, as well as in a number of JSON Schema journal articles) and the ability to compare the previous work with each other
Mentors
@Julian
Expected Difficulty
medium
Expected Time Commitment
175