Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schemas for all JSON extracts #731

Closed
wants to merge 13 commits into from
Closed

Add schemas for all JSON extracts #731

wants to merge 13 commits into from

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented Sep 14, 2022

This provides a first level of schema validation for curated data extracts, see #657 for context.

Goal is to make it easier to detect and document (through a changelog, so also useful for #704) situations where we change the structure of data extracts.

Schemas, notably those that deal with parsed IDL structures, could go deeper into details.

Tests are run against the curated version of data. That is not necessary for extracts that aren't actually curated (dfns, headings, ids, links, refs), just more convenient not to have branching logic in the test code.

Creating the PR as pull request as 69 of the new tests currently fail, either because extraction logic in Reffy needs to be slightly improved to create more consistent data structures, or because of actual issues in the specs themselves (e.g. invalid URL fragments).

This provides a first level of schema validation for curated data extracts,
see #657 for context.

Goal is to make it easier to detect and document (through a changelog, so also
useful for #704) situations where we change the structure of data extracts.

Schemas, notably those that deal with parsed IDL structures, could go deeper
into details.

Tests are run against the curated version of data. That is not necessary for
extracts that aren't actually curated (dfns, headings, ids, links, refs), just
more convenient not to have branching logic in the test code.
Options need to be specified in the constructor and old `format` option no
longer exists as far as I can tell. The options are set to report all errors
and to include the validated to ease direct understanding of what the error is.
The URI format seems to be more picky about fragments.

The update also raises an error when an ID starts with a `#`. That's allowed in
theory but the few cases where this happens in practice are clearly unintended.
An empty string is usually the sign that extraction failed to work as intended.

The update also drops the check on IDs that start with `#`. That analysis should
typically rather be done in Strudy.
A couple of headings don't have IDs and there is not much that we can do about
it.
Linked to w3c/reffy#1075

This will only work once a version of Reffy has been released that exposes the
appropriate schema validation function.
No need to check whether the `validate` function exists. Test will automatically
fail if it doesn't whereas we're expecting one.
tidoust added a commit that referenced this pull request Sep 27, 2022
This makes use of the new schema validation function in Reffy to make sure that
the curated data Webref produces follow expected scheams, see:
  w3c/reffy#1075

This replaces #731 and fixes #657.

Schemas, notably those that deal with parsed IDL structures, could go deeper
into details. To be improved over time.

Tests are run against the curated version of data. That is not necessary for
extracts that aren't actually curated (dfns, headings, ids, links, refs), just
more convenient not to have branching logic in the test code.
@tidoust
Copy link
Member Author

tidoust commented Sep 27, 2022

Superseded by #749.

@tidoust tidoust closed this Sep 27, 2022
tidoust added a commit that referenced this pull request Sep 27, 2022
This makes use of the new schema validation function in Reffy to make sure that
the curated data Webref produces follow expected scheams, see:
  w3c/reffy#1075

This replaces #731 and fixes #657.

Schemas, notably those that deal with parsed IDL structures, could go deeper
into details. To be improved over time.

Tests are run against the curated version of data. That is not necessary for
extracts that aren't actually curated (dfns, headings, ids, links, refs), just
more convenient not to have branching logic in the test code.
@dontcallmedom dontcallmedom deleted the schemas branch March 22, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant