-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract CDDL definitions #1723
Extract CDDL definitions #1723
Conversation
Needed for w3c/webref#1353. With this update, Reffy now looks for and extracts CDDL content defined in `<pre class="cddl">` block. The logic is vastly similar to the logic used for IDL. Shared code was factored out accordingly. Something specific about CDDL: on top of generating text extracts, the goal is also to create one extract per CDDL module that the spec defines. To associate a `<pre>` block with one or more CDDL module, the code looks for a possible `data-cddl-module` module, or for module names in the `class` attribute (prefixed by `cddl-` or suffixed by `-cddl`). The former isn't used by any spec but is the envisioned mechanism in Bikeshed to define the association, the latter is the convention currently used in the WebDriver BiDi specification. When a spec defines modules, CDDL defined in a `<pre>` block with no explicit module annotation is considered to be defined for all modules (not doing so would essentially mean the CDDL would not be defined for any module, which seems weird). When there is CDDL, the extraction produces: 1. an extract that contains all CDDL definitions: `cddl/[shortname].cddl` 2. one extract per CDDL module: `cddl/[shortname]-[modulename].cddl` (I'm going to assume that no one is ever going to define a module name that would make `[shortname]-[modulename]` collide with the shortname of another spec). Note: some specs that define CDDL do not flag the `<pre>` blocks in any way (Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also, there are a couple of places in the WebDriver BiDi spec that use a `<pre class="cddl">` block to *reference* a CDDL construct defined elsewhere. Extraction will happily include these references as well, leading to CDDL extracts that contain invalid CDDL. These need fixing in the specs.
When a spec defines CDDL modules, the union of all CDDL is now written to a file named `[shortname]-all.cddl` instead of simply `[shortname].cddl`. This is meant to make it slightly clearer that the union of all CDDL file is not necessarily the panacea. For example, it may not contain a useful first rule against which a CBOR data item that would match any of the modules may be validated. In other words, when the crawler produces a `[shortname].cddl` file, that means there's no module. If it doesn't, best is to check the module, with "all" being a reserved module name in the spec that gets interpreted to mean "any module". When a spec defines CDDL modules, it may also define CDDL rules that only appear in the "all" file by specifying `data-cddl-module="all"`. This is useful to define a useful first type in the "all" extract.
I made some updates on the way modules are processed following discussions in w3c/webref#1353:
Running a local crawl, I end up with the following list of CDDL extracts:
CDDL defined in other specs (Open Screen Application Protocol, Open Screen Network Protocol, Web Authentication) does not appear because these specs do not yet flag the The extracted CDDL currently contains occurrences of invalid I don't have an immediate solution for validating the CDDL in Webref for now. That would be next step :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits
New feature: - Extract CDDL definitions (#1723) If you parse `index.json`, note that, as opposed to other extracts, there may be more than one CDDL extract per spec. As such, the value of the `cddl` property is an array of strings, and not simply a string. Dependencies bumped: - Bump respec from 35.2.0 to 35.2.1 (#1729) - Bump web-specs from 3.29.0 to 3.30.0 (#1730) - Bump puppeteer from 23.10.1 to 23.10.4 (#1727, #1728)
Needed for w3c/webref#1353. Created as draft pull request as I'm not yet fully convinced by the combined extract, and as the
data-cddl-module
convention hasn't been integrated in Bikeshed yet (see speced/bikeshed#2977).With this update, Reffy looks for and extracts CDDL content defined in
<pre class="cddl">
block. The logic is vastly similar to the logic used for IDL. Shared code was factored out accordingly.Something specific about CDDL: on top of generating text extracts, the goal is also to create one extract per CDDL module that the spec defines. To associate a
<pre>
block with one or more CDDL module, the code looks for a possibledata-cddl-module
module, or for module names in theclass
attribute (prefixed bycddl-
or suffixed by-cddl
). The former isn't used by any spec but is the envisioned mechanism in Bikeshed to define the association, the latter is the convention currently used in the WebDriver BiDi specification.When a spec defines modules, CDDL defined in a
<pre>
block with no explicit module annotation is considered to be defined for all modules (not doing so would essentially mean the CDDL would not be defined for any module, which seems weird).When there is CDDL, the extraction produces:
cddl/[shortname].cddl
cddl/[shortname]-[modulename].cddl
(I'm going to assume that no one is ever going to define a module name that would make[shortname]-[modulename]
collide with the shortname of another spec).Note: some specs that define CDDL do not flag the
<pre>
blocks in any way (Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also, there are a couple of places in the WebDriver BiDi spec that use a<pre class="cddl">
block to reference a CDDL construct defined elsewhere. Extraction will happily include these references as well, leading to CDDL extracts that contain invalid CDDL. These need fixing in the specs.