Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract CDDL definitions #1723

Merged
merged 3 commits into from
Dec 13, 2024
Merged

Extract CDDL definitions #1723

merged 3 commits into from
Dec 13, 2024

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented Dec 9, 2024

Needed for w3c/webref#1353. Created as draft pull request as I'm not yet fully convinced by the combined extract, and as the data-cddl-module convention hasn't been integrated in Bikeshed yet (see speced/bikeshed#2977).

With this update, Reffy looks for and extracts CDDL content defined in <pre class="cddl"> block. The logic is vastly similar to the logic used for IDL. Shared code was factored out accordingly.

Something specific about CDDL: on top of generating text extracts, the goal is also to create one extract per CDDL module that the spec defines. To associate a <pre> block with one or more CDDL module, the code looks for a possible data-cddl-module module, or for module names in the class attribute (prefixed by cddl- or suffixed by -cddl). The former isn't used by any spec but is the envisioned mechanism in Bikeshed to define the association, the latter is the convention currently used in the WebDriver BiDi specification.

When a spec defines modules, CDDL defined in a <pre> block with no explicit module annotation is considered to be defined for all modules (not doing so would essentially mean the CDDL would not be defined for any module, which seems weird).

When there is CDDL, the extraction produces:

  1. an extract that contains all CDDL definitions: cddl/[shortname].cddl
  2. one extract per CDDL module: cddl/[shortname]-[modulename].cddl (I'm going to assume that no one is ever going to define a module name that would make [shortname]-[modulename] collide with the shortname of another spec).

Note: some specs that define CDDL do not flag the <pre> blocks in any way (Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also, there are a couple of places in the WebDriver BiDi spec that use a <pre class="cddl"> block to reference a CDDL construct defined elsewhere. Extraction will happily include these references as well, leading to CDDL extracts that contain invalid CDDL. These need fixing in the specs.

Needed for w3c/webref#1353.

With this update, Reffy now looks for and extracts CDDL content defined in
`<pre class="cddl">` block. The logic is vastly similar to the logic used for
IDL. Shared code was factored out accordingly.

Something specific about CDDL: on top of generating text extracts, the goal is
also to create one extract per CDDL module that the spec defines. To associate
a `<pre>` block with one or more CDDL module, the code looks for a possible
`data-cddl-module` module, or for module names in the `class` attribute
(prefixed by `cddl-` or suffixed by `-cddl`). The former isn't used by any spec
but is the envisioned mechanism in Bikeshed to define the association, the
latter is the convention currently used in the WebDriver BiDi specification.

When a spec defines modules, CDDL defined in a `<pre>` block with no explicit
module annotation is considered to be defined for all modules (not doing so
would essentially mean the CDDL would not be defined for any module, which
seems weird).

When there is CDDL, the extraction produces:
1. an extract that contains all CDDL definitions: `cddl/[shortname].cddl`
2. one extract per CDDL module: `cddl/[shortname]-[modulename].cddl`
(I'm going to assume that no one is ever going to define a module name that
would make `[shortname]-[modulename]` collide with the shortname of another
spec).

Note: some specs that define CDDL do not flag the `<pre>` blocks in any way
(Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also,
there are a couple of places in the WebDriver BiDi spec that use a
`<pre class="cddl">` block to *reference* a CDDL construct defined elsewhere.
Extraction will happily include these references as well, leading to CDDL
extracts that contain invalid CDDL. These need fixing in the specs.
When a spec defines CDDL modules, the union of all CDDL is now written to a
file named `[shortname]-all.cddl` instead of simply `[shortname].cddl`. This
is meant to make it slightly clearer that the union of all CDDL file is not
necessarily the panacea. For example, it may not contain a useful first rule
against which a CBOR data item that would match any of the modules may be
validated.

In other words, when the crawler produces a `[shortname].cddl` file, that
means there's no module. If it doesn't, best is to check the module, with
"all" being a reserved module name in the spec that gets interpreted to mean
"any module".

When a spec defines CDDL modules, it may also define CDDL rules that only
appear in the "all" file by specifying `data-cddl-module="all"`. This is useful
to define a useful first type in the "all" extract.
@tidoust
Copy link
Member Author

tidoust commented Dec 12, 2024

I made some updates on the way modules are processed following discussions in w3c/webref#1353:

  • When modules are used, the extract that contains all CDDL definitions is now named [shortname]-all.cddl. This is meant to avoid ending up with [shortname].cddl that looks like the right entry point. That collection of CDDL definitions may well not be directly usable for validation of arbitrary CBOR data items (first rule will likely be for one of the modules).
  • Specs can now also define CDDL definitions specifically for the "all" module through a data-cddl-module="all" attribute. This makes it possible to define a root rule in the spec. Side note: such CDDL definitions would not appear in the generated CDDL index that Bikeshed would generate if support for CDDL gets added. I think that's fine for now, I don't see any reason to define additional CDDL definitions for the "all" module on top of creating a root rule.

Running a local crawl, I end up with the following list of CDDL extracts:

  • at-driver-all.cddl
  • at-driver-local.cddl
  • at-driver-remote.cddl
  • permissions-all.cddl
  • permissions-local.cddl
  • permissions-remote.cddl
  • web-bluetooth-all.cddl
  • web-bluetooth-local.cddl
  • web-bluetooth-remote.cddl
  • webdriver-bidi-all.cddl
  • webdriver-bidi-local.cddl
  • webdriver-bidi-remote.cddl

CDDL defined in other specs (Open Screen Application Protocol, Open Screen Network Protocol, Web Authentication) does not appear because these specs do not yet flag the <pre> tags that contain the CDDL in any way.

The extracted CDDL currently contains occurrences of invalid EmptyResult constructs. That's because the specs currently also use a <pre class="cddl"> tag to reference the rule, (e.g., session.end). That's not an extraction bug, specs need to be fixed.

I don't have an immediate solution for validating the CDDL in Webref for now. That would be next step :)

@tidoust tidoust marked this pull request as ready for review December 12, 2024 06:27
Copy link
Member

@dontcallmedom dontcallmedom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits

src/browserlib/get-code-elements.mjs Outdated Show resolved Hide resolved
src/browserlib/get-code-elements.mjs Outdated Show resolved Hide resolved
src/browserlib/trim-spaces.mjs Outdated Show resolved Hide resolved
src/browserlib/extract-cddl.mjs Outdated Show resolved Hide resolved
@tidoust tidoust merged commit fc8c944 into main Dec 13, 2024
1 check passed
@tidoust tidoust deleted the cddl branch December 13, 2024 09:17
tidoust added a commit that referenced this pull request Dec 16, 2024
New feature:
- Extract CDDL definitions (#1723)

If you parse `index.json`, note that, as opposed to other extracts, there may
be more than one CDDL extract per spec. As such, the value of the `cddl`
property is an array of strings, and not simply a string.

Dependencies bumped:
- Bump respec from 35.2.0 to 35.2.1 (#1729)
- Bump web-specs from 3.29.0 to 3.30.0 (#1730)
- Bump puppeteer from 23.10.1 to 23.10.4 (#1727, #1728)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants