Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract CDDL for WebDriver BiDi #1353

Closed
gsnedders opened this issue Sep 26, 2024 · 23 comments
Closed

Extract CDDL for WebDriver BiDi #1353

gsnedders opened this issue Sep 26, 2024 · 23 comments

Comments

@gsnedders
Copy link
Member

Currently we have multiple specs defining WebDriver BiDi endpoints; https://w3c.github.io/webdriver-bidi/ itself does, but also https://w3c.github.io/permissions/#automation-webdriver-bidi and https://webbluetoothcg.github.io/web-bluetooth/#automated-testing.

We have some code to extract CDDL from the WebDriver BiDi code, see https://github.com/w3c/webdriver-bidi/blob/3b1939c2380b9ba12cc123cef03c1e3af15fa01d/scripts/cddl/utils.js#L13, and generates three CDDL files ultimately: local.cddl, remote.cddl, all.cddl.

We probably want to pull out all of local-cddl and remote-cddl separately; all is just the union of both of these.

See also: w3c/webdriver-bidi#791

@tidoust
Copy link
Member

tidoust commented Sep 26, 2024

I'm not sure how to approach the "multiple extracts per spec" need (local and remote for WebDriver Bidi). The crawler generates one extract per spec. At a minimum, we'd need some sort of generic convention to understand what bucket to use for a given definition.

I note that the Open Screen Protocol also defines CDDL. In the case of the Open Screen Protocol, CDDL is actually maintained in a .cddl file that gets converted to an HTML file through a script upon build and imported in the spec (interestingly, the resulting spec does not contain any semantic info that says "this is CDDL").

If we start extracting CDDL from specs, it would be useful to also think about some validation mechanism.

@gsnedders
Copy link
Member Author

(I also guess I maybe should've filed this against reffy rather than webref)

@tidoust
Copy link
Member

tidoust commented Sep 26, 2024

Another spec that defines CDDL is WebAuthn. There, the CDDL is not flagged in any particulary way (just a <pre>). I don't see other specs in browser-specs that reference RFC8610.

@gsnedders
Copy link
Member Author

I wonder if we need to add an extra class for WebDriver BiDi CDDL?

@tidoust
Copy link
Member

tidoust commented Oct 4, 2024

That does not change the problem at hand, but just to draw a more complete picture, in an ideal world, authoring tools would also support something like a <pre class=cddl> section that would:

  1. Turn on syntax highlighting (CDDL syntax highlighting is already supported in Bikeshed provided one adds highlight="cddl", but does not seem supported by highlight.js that ReSpec uses)
  2. Create definitions and auto-link terms, as done in Web IDL blocks, noting that there is no specific definition type that may be used for CDDL terms right now.
  3. Allow for the automatic creation of an index at the end of the specification.

For Bikeshed, that's tracked in speced/bikeshed#2072. The issue notes the need to split the index into two (and the need for Webref to extract the CDDL).

@tidoust
Copy link
Member

tidoust commented Dec 3, 2024

Getting back to this...

@gsnedders What is the use case for the all.cddl file?

It makes sense to me for the crawler to produce local.cddl and remote.cddl extracts. I'm wondering about the rationale for producing the all.cddl extract. Is it useful for implementations, testing purpose, or something else?

@gsnedders
Copy link
Member Author

@tidoust That's a good question; I listed it because it already existed, seemingly from w3c/webdriver-bidi#465.

@OrKoN @jgraham @whimboo do you know?

@OrKoN
Copy link

OrKoN commented Dec 3, 2024

we do use all.cddl to generate TypeScript types and validation code in chromium-bidi. I believe all.cddl helps to avoid duplication of definitions which are marked as both local and remote.

@tidoust
Copy link
Member

tidoust commented Dec 3, 2024

OK, thanks. If that file is useful, I'll try to re-create it in Webref, then.

While you're all watching, feel free to have a look at speced/bikeshed#2977, a proposal to add CDDL support to Bikeshed.

Support would add autolinking capabilities within CDDL blocks and generate a CDDL index at the end of the spec. Currently, this would generate a CDDL index per module (building on a convention similar to the one used in WebDriver BiDi). In the case of WebDriver BiDi, the CDDL index would thus list two modules, one with the contents of local.cddl and one with the contents of remote.cddl. If you feel that the spec itself should also contain a CDDL index with all definitions, let me know! I'm thinking it could be generated by the crawler otherwise.

Alternatively, it could also contain 3 indexes:

  • One with all CDDL definitions that are common to all modules
  • One with all local end definitions
  • One with all remote end definitions

I'm not sure how to generalize this approach to a spec that defines more than two CDDL modules though (combinatorial explosion), but perhaps it just doesn't make sense for a spec to define more than 2 CDDL modules at once.

@whimboo
Copy link

whimboo commented Dec 4, 2024

I've a question regarding CI checks when certain specifications are changing their CDDL definitions. Is there a way to actually cross-check if we can still generate the files or if failures occur? It's basically out of our hand what other specifications are doing, and we not necessarily want to see our final CDDL export broken.

@tidoust
Copy link
Member

tidoust commented Dec 4, 2024

There are two data levels in Webref:

  • the main branch contains raw data extracted from specs. That comes with no guarantee whatsoever. If a spec that defines CDDL contains invalid CDDL, that invalid CDDL will be in the main branch.
  • the curated branch contains a curated version of the raw data. That branch is only updated when guarantees are met. When guarantees are broken, we patch the data manually to resume updates. Now, we'll need to agree on guarantees for CDDL. I don't really know what "generate the files" means. I suppose that we'll want to run a CDDL validator (cddl-rs?) against the extracts and guarantee that all CDDL extracts pass validation.

@tidoust
Copy link
Member

tidoust commented Dec 5, 2024

Oh, and there's a third level I completely forgot to mention:

  • Updates are manually reviewed on a weekly basis (on average) to release npm packages: @webref/css, @webref/idl, @webref/events. The manual review allows us to catch additional consistency issues that cannot easily be checked automatically. We could envision a @webref/cddl package if that seems useful.

@whimboo
Copy link

whimboo commented Dec 5, 2024

* the `curated` branch contains a curated version of the raw data. That branch is only updated when guarantees are met. When guarantees are broken, we patch the data manually to resume updates. Now, we'll need to agree on guarantees for CDDL. I don't really know what "generate the files" means. I suppose that we'll want to run a CDDL validator ([cddl-rs](https://github.com/anweiss/cddl)?) against the extracts and guarantee that all CDDL extracts pass validation.

Yes, that’s exactly what I was referring to. We already perform such validation in our GitHub Actions for WebDriver BiDi to ensure that a PR doesn’t break the exported CDDL. However, I’m still curious — if someone were to make a PR in, for example, the Permissions API that breaks the CDDL due to a conflict with the CDDL definitions from the WebAuthn spec, is there a way for WebRef to generate the final CDDL and flag a failure if it encounters an issue? Maybe each of the specs that export a CDDL for BiDi would need an external trigger for WebRef?

* Updates are manually reviewed on a weekly basis (on average) to release npm packages: [`@webref/css`](https://www.npmjs.com/package/@webref/css), [`@webref/idl`](https://www.npmjs.com/package/@webref/idl), [`@webref/events`](https://www.npmjs.com/package/@webref/events). The manual review allows us to catch additional consistency issues that cannot easily be checked automatically. We could envision a `@webref/cddl` package if that seems useful.

I assume that would be fine, as it’s highly unlikely that a new API could be implemented and shipped in any browser within a week. Having a guaranteed error-free CDDL that WebDriver BiDi clients can use for code generation would be very helpful. I assume such a cddl package could also support including different CDDL definitions, allowing other specs not tied to WebDriver to use it as well.

@OrKoN
Copy link

OrKoN commented Dec 6, 2024

The current plan to only add the local and remote indexes sounds good to me. One concern will be that there are currently 38 types that are both local and remote so those 38 types would be duplicated in both lists? It might be more useful to generate a single index based on all.cddl but have some way to annotate for each type if it is local, remote or both.

@tidoust
Copy link
Member

tidoust commented Dec 6, 2024

However, I’m still curious — if someone were to make a PR in, for example, the Permissions API that breaks the CDDL due to a conflict with the CDDL definitions from the WebAuthn spec, is there a way for WebRef to generate the final CDDL and flag a failure if it encounters an issue? Maybe each of the specs that export a CDDL for BiDi would need an external trigger for WebRef?

I wasn't necessarily planning to add logic to Webref to combine and check CDDL exports across specs initially.

As opposed to IDL where all specs share one single namespace, CDDL defined in different specs typically lie in different namespaces. To add checks in Webref, we would need to know which CDDL extracts to combine. I'd prefer that to be defined in the specs themselves (we try to minimize the number of things that we have to maintain manually in Webref, data patching and manual reviews already take time).

There is ongoing work on import statements for CDDL. But the import statements would need to be in the WebDriver BiDi spec itself, which seems to be going in the wrong direction for an extension mechanism. Ideally, we'd need a spec convention that the Permissions API could use to say "the CDDL defined here extends the one defined in WebDriver BiDi" and, more importantly, that we could detect and leverage when we crawl the spec.

@tidoust
Copy link
Member

tidoust commented Dec 6, 2024

The current plan to only add the local and remote indexes sounds good to me. Once concern will be that there are currently 38 types that are both local and remote so those 38 types would be duplicated in both lists?

Yes.

It might be more useful to generate a single index based on all.cddl but have some way to annotate for each type if it is local, remote or both.

Maybe :) I guess I don't really know how the indexes in the spec are going to be used. It seemed more useful to me at first glance to separate the CDDL modules completely, instead of having a set of interleaved CDDL rules that apply to either or both of them with repeated comments such as # Defined for: remote end at the start of rule definitions. But then the CDDL in WebDriver BiDi is quite long, so it may be better to avoid duplication. Could you suggest an annotation mechanism?

@OrKoN
Copy link

OrKoN commented Dec 6, 2024

Could you suggest an annotation mechanism?

Not really, therefore, I think the current proposal is good.

tidoust added a commit to w3c/reffy that referenced this issue Dec 7, 2024
Needed for w3c/webref#1353.

With this update, Reffy now looks for and extracts CDDL content defined in
`<pre class="cddl">` block. The logic is vastly similar to the logic used for
IDL. Shared code was factored out accordingly.

Something specific about CDDL: on top of generating text extracts, the goal is
also to create one extract per CDDL module that the spec defines. To associate
a `<pre>` block with one or more CDDL module, the code looks for a possible
`data-cddl-module` module, or for module names in the `class` attribute
(prefixed by `cddl-` or suffixed by `-cddl`). The former isn't used by any spec
but is the envisioned mechanism in Bikeshed to define the association, the
latter is the convention currently used in the WebDriver BiDi specification.

When a spec defines modules, CDDL defined in a `<pre>` block with no explicit
module annotation is considered to be defined for all modules (not doing so
would essentially mean the CDDL would not be defined for any module, which
seems weird).

When there is CDDL, the extraction produces:
1. an extract that contains all CDDL definitions: `cddl/[shortname].cddl`
2. one extract per CDDL module: `cddl/[shortname]-[modulename].cddl`
(I'm going to assume that no one is ever going to define a module name that
would make `[shortname]-[modulename]` collide with the shortname of another
spec).

Note: some specs that define CDDL do not flag the `<pre>` blocks in any way
(Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also,
there are a couple of places in the WebDriver BiDi spec that use a
`<pre class="cddl">` block to *reference* a CDDL construct defined elsewhere.
Extraction will happily include these references as well, leading to CDDL
extracts that contain invalid CDDL. These need fixing in the specs.
@tidoust
Copy link
Member

tidoust commented Dec 9, 2024

@OrKoN Back to the all.cddl file. I understand that's not needed to generate TypeScript types, but something that still bugs me is that the file in itself is not a completely usable CDDL document.

As specified in RFC8610, a CDDL document defines one type ("the one defined by its first rule"). For all.cddl, first rule is Command, so the document essentially defines the CDDL of the remote end. The Message rule does appear in the document, but it is not attached to the first rule and, as such, remains hidden.

As a result, if I give all.cddl to a CDDL validator, it can only be used to validate Command CBOR data items. Of course, it's likely that I'll be able to give the validator the name of the rule to start from and validate Message CBOR data items as well, but having to specify the initial rule is not standard behavior.

If we publish a CDDL document that contains both remote end and local end definitions, I would expect it to be directly useful to validate CBOR data items that match either set of definitions. In short, I would expect all.cddl to start with something like:

Root = Command / Message

The crawler could perhaps generate that rule itself from the first rule names it encounters. The crawler would just need to mint a name for the first rule (Root or something more guaranteed not to clash with another rule name).

Alternatively, it could be the spec's reponsibility to define the root rule. This could perhaps be done in the spec as:

<pre class="cddl remote-cddl">
Root /= Command
Command = ...
</pre>

<pre class="cddl local-cddl">
Root /= Message
Message = ...
</pre>

Does this make sense? Any preferred approach?

@gsnedders
Copy link
Member Author

However, I’m still curious — if someone were to make a PR in, for example, the Permissions API that breaks the CDDL due to a conflict with the CDDL definitions from the WebAuthn spec, is there a way for WebRef to generate the final CDDL and flag a failure if it encounters an issue? Maybe each of the specs that export a CDDL for BiDi would need an external trigger for WebRef?

I don't think we need validation similar to what we have for WebIDL initially, just being able to find all the CDDL is itself already a win.

/test/idl/consistency.js checks consistency for all IDL stuff, but as mentioned this is slightly harder in the CDDL case (there's specs with CDDL which aren't WebDriver BiDi messages, whereas I believe every spec with IDL is IDL for browser interfaces).

Certainly longer term requiring what's in the curated branch to pass similar tests wouldn't be unreasonable, though it will likely require patches similar to what we have for IDL today to deal with cases where specs add contradictory definitions — because that's a problem we have with IDL too.

@OrKoN
Copy link

OrKoN commented Dec 11, 2024

@tidoust with our tooling we validate per command/event so we do not use the entire CDDL document for validation. I think if we want to support this use case the specs should define the root (the first type in the category). In case of all.cddl though defining Root = Command / Message does not seem to be useful for validation since you probably wants to validate incoming separately from outgoing messages.

@tidoust
Copy link
Member

tidoust commented Dec 11, 2024

That seems fine, @OrKoN. From a naming perspective, the current plan was for the crawler to create webdriver-bidi-local.cddl, webdriver-bidi-remote.cddl, and webdriver-bidi.cddl for the entire version. If that version is not going to be useful for validation, I think I would prefer to use webdriver-bidi-all.cddl as a name, so that it's somewhat clearer that this is not the default extract.

Another consequence is more related to spec authoring. Right now, the plan is to consider that <pre class="cddl"> tag that does not specify any module is defined for all modules. This was meant to ease authoring of CDDL common to all modules (and also because, in Bikeshed, I don't really know how to make such CDDL appear in the generated CDDL index). But if we want to allow specs to define something such as Root = Command / Message, such CDDL should probably only go to the entire CDDL extract. I'll change that behavior in Reffy's PR and in the proposed Bikeshed PR.

@tidoust
Copy link
Member

tidoust commented Dec 11, 2024

@gsnedders said:

I don't think we need validation similar to what we have for WebIDL initially, just being able to find all the CDDL is itself already a win.

+1 to the multi-step plan :) I'll finalize pending PRs, but we should be able to start adding CDDL extracts to Webref soon (with no validation) .

tidoust added a commit to w3c/reffy that referenced this issue Dec 13, 2024
* Extract CDDL definitions

Needed for w3c/webref#1353.

With this update, Reffy now looks for and extracts CDDL content defined in
`<pre class="cddl">` block. The logic is vastly similar to the logic used for
IDL. Shared code was factored out accordingly.

Something specific about CDDL: on top of generating text extracts, the goal is
also to create one extract per CDDL module that the spec defines. To associate
a `<pre>` block with one or more CDDL module, the code looks for a possible
`data-cddl-module` module, or for module names in the `class` attribute
(prefixed by `cddl-` or suffixed by `-cddl`). The former isn't used by any spec
but is the envisioned mechanism in Bikeshed to define the association, the
latter is the convention currently used in the WebDriver BiDi specification.

When a spec defines modules, CDDL defined in a `<pre>` block with no explicit
module annotation is considered to be defined for all modules (not doing so
would essentially mean the CDDL would not be defined for any module, which
seems weird).

When there is CDDL, the extraction produces:
1. an extract that contains all CDDL definitions: `cddl/[shortname].cddl`
2. one extract per CDDL module: `cddl/[shortname]-[modulename].cddl`
(I'm going to assume that no one is ever going to define a module name that
would make `[shortname]-[modulename]` collide with the shortname of another
spec).

Note: some specs that define CDDL do not flag the `<pre>` blocks in any way
(Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also,
there are a couple of places in the WebDriver BiDi spec that use a
`<pre class="cddl">` block to *reference* a CDDL construct defined elsewhere.
Extraction will happily include these references as well, leading to CDDL
extracts that contain invalid CDDL. These need fixing in the specs.

* Change name of "all" extract, allow CDDL defs for it

When a spec defines CDDL modules, the union of all CDDL is now written to a
file named `[shortname]-all.cddl` instead of simply `[shortname].cddl`. This
is meant to make it slightly clearer that the union of all CDDL file is not
necessarily the panacea. For example, it may not contain a useful first rule
against which a CBOR data item that would match any of the modules may be
validated.

In other words, when the crawler produces a `[shortname].cddl` file, that
means there's no module. If it doesn't, best is to check the module, with
"all" being a reserved module name in the spec that gets interpreted to mean
"any module".

When a spec defines CDDL modules, it may also define CDDL rules that only
appear in the "all" file by specifying `data-cddl-module="all"`. This is useful
to define a useful first type in the "all" extract.

* Integrate review feedback
@tidoust
Copy link
Member

tidoust commented Dec 17, 2024

Webref now has CDDL extracts of main specs that define CDDL: AT Driver, Permissions, Web Bluetooth, WebDriver BiDi. To be joined later on by a few other specs once the specs produce suitable <pre class="cddl"> blocks (also when Bikeshed gets support for CDDL).

The CDDL extracts are per module, with an -all extract that contains all CDDL definitions (without duplicates, or at least without duplicates unless the duplicates are in the spec itself).

The CDDL extracts come with no guarantee for now. Most are actually slightly invalid today because the specs currently contain <pre class="cddl"> blocks that reference CDDL constructs instead of defining them (see w3c/webdriver-bidi#831 for WebDriver BiDi).

I'm closing this issue as addressed. I created #1417 to track CDDL validation in Webref so that we may make guarantees in the future. We may also consider creating an @webref/cddl npm package once guarantees are in place if that seems useful.

@tidoust tidoust closed this as completed Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants