-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CDDL marking and linking #2977
base: main
Are you sure you want to change the base?
Conversation
This makes Bikeshed process CDDL blocks à la `<pre class=cddl>` as described in speced#2072 to: - add highlighting (done by Pygments, the code merely sets the right class) - wrap terms in `<dfn>` and `<a>` automatically, making it possible to define them in, or reference them from, the rest of the prose. Most of the logic is copied from the logic used to process IDL blocks. As opposed to IDL definitions, CDDL definitions are not exported by default as most CDDL definitions only apply to the underlying spec. Support for CDDL definitions means 4 new definition types get introduced: - `cddl-type`: roughly the equivalent of an IDL interface. Type definitions do not have a `data-dfn-for` attribute. - `cddl-key`: roughly the equivalent of an IDL attribute. Key definitions always have a `data-dfn-for` attribute that links back to a CDDL type. - `cddl-value`: roughly the equivalent of an enum-value in IDL. Value definitions always have a `data-dfn-for` attribute that links back to a CDDL type or to a CDDL key. - `cddl-parameter`: used for generic parameter names (noting that no known spec uses CDDL generics for the time being). The definition types are prefixed with `cddl-` to avoid collision with other types used for other purpose (e.g., `value` in CSS). The code also collects CDDL definitions to produce a CDDL index at the end of the spec. To accommodate specs like WebDriver BiDi that define two sets of CDDL (remote end and local end), a mechanism gets added to associate CDDL definitions with a given module: 1. The CDDL module must be defined with a `<dfn>` with a `data-cddl-module` attribute set to a shortname for the CDDL module 2. CDDL blocks must add a `data-cddl-module` attribute set to a comma-separated list of CDDL module shortnames they belong to. If a CDDL block does not have that attribute, the code considers it is defined in all CDDL modules. The index is split per module, using the `<dfn>` text as title for each module. Note: Even when modules are used, CDDL definitions in a spec are part of the same namespace, meaning a `foo` rule cannot be defined differently within a single spec for two different CDDL modules. CDDL parsing is done through a hand-made CDDL parser that follows the CDDL grammar defined in RFC 8610, currently sitting under: https://github.com/tidoust/cddlparser To ease authoring, a new shorthand notation gets introduced to reference CDDL: `{^foo^}` is an autolink to a CDDL definition. FWIW, the shorthand was chosen to mean "shortcut to CDDL code" on the grounds that: - `{}` indicates a code block (for IDL) - `^` means "cut" in CDDL CDDL type definitions can become somewhat convoluted, the code does not attempt to be too smart and won't autolink definitions that are too complex.
`mypy` started reporting type issues as it can now look into the `cddlparser` package. This update bumps the version of `cddlparser` to get better typing info, and adds assertions when appropriate.
Previous version was not fully compatible with Python 3.9 and Python 3.10. The `cddlparser` project now runs tests under multiple versions of Python to guarantee compatibility.
@tabatkins, coding-wise, this should be ready for review. I published the CDDL parser as a Pypi package, adapted it to run with Python >= 3.9, and linted the code here. The parser is also more aligned with the CDDL grammar and reports an error when the CDDL syntax is invalid, which seems a good thing. FYI, there's an issue right now with the WebDriver BiDi in that it uses a If the PR looks reasonable, I'll update the documentation as well. |
New version returns more user-friendly error messages
Cool, I'm gonna delay reviewing this for a bit longer as I'm this close to pushing out the big shorthand parser rewrite, which this interacts with. I'll probably have to manually fix things up for that, but that's fine, I'm happy to do so. |
Yup, looks reasonable overall. I'll need to rewrite the shorthand parser since that's totally changed now, but that's trivial.
The question is just what we actually need to be linkable. Reading the RFC, I agree that "types" are definitely the main useful thing. They have a clear defining location and it's obvious that linking to their occurences in other structures is useful.
Addressed above. I lean towards anonymous values/enumerations not getting dfns.
The shorthand you're using seems reasonable. The important thing is just that we ideally want to avoid having to manually specify the specific link type as much as possible. It looks like cddl-types will always be idents, while cddl-keys, cddl-values, and cddl-params are always going to be "for" a cddl-type. That's a decent bit of possible collision, but it's possible that, like IDL, it's rare for there to be actual name re-use (especially since they're generally spec-local), so it should work fine by default.
Yes, auto-linking them to something in the RFC is good, same as we autolink all the built-ins for WebIDL to the WebIDL spec.
Yes, for assigning blocks of cddl code to a module. I'd prefer the actual module definitions just be a
It doesn't have to go there, but it's not bad to do so either. Then I could poke at it more easily, too. ^_^ |
I'm not 100% sure where "anonymous" starts. Trying to distinguish relevant cases, given:
The code currently creates a defining location for values in both of them. In the first case, it ignores the array and parentheses enclosings because they do not introduce ambiguity from a linking perspective. In the second case, it creates values that are for the To disambiguate, one would need to write Should the code avoid defining locations in both cases? Only the second one? Or were you thinking about something else? This would restrict
The
There aren't many examples of generic parameters used in specifications. I don't think they are being used in any W3C specification for now in particular. I see one occurrence in RFC9581:
I suspect linking is only going to be useful within the definition itself (that is, to link back
Strings and barewords are supported in the same way:
... would result in defining key locations which can be referenced through Non-string keys, and in particular numeric keys, are treated as non-linkable. Some specs use them extensively, such as the Open Screen Protocol, but I cannot think of a good way to create useful defining locations (the use of integer keys is for optimization purpose, to talk about them in the prose, the spec "names" them through a CDDL comment).
No problem introducing a new dfn type. FWIW, I had stuck to "dfn" for three reasons:
|
The CDDL index contained the "local" module twice as a result.
This pulls in the latest changes from `main` and resolves conflicts that arose. In particular, the latest changes include a re-write of the parsers that required to also partially re-write the CDDL shorthand parsing. In practice, as for other changes, CDDL shorthand parsing closely follows the logic used for IDL shorthand parsing.
I thought I'd do it as a way to force me to look into how things work. I believe I managed to adjust the code, at least I get the output I expect ;) The test that fails seems unrelated to my updates (and also fails in |
These both seem reasonable to me.
While this seems reasonable for this case, what about when dfns are 3-deep or more? Like:
Would
Yeah, this is the solution if we're uncomfortable with the inferences, especially for deep nesting.
Okay, that's how I thought they'd probably work. So, the issue is that the RFC doesn't seem to explicitly define the scope of a generic parameter name. I presume it's treated as a type with lexical scope covering the definition only, so two different types could use the same parameter name and it's not ambiguous. This complicates things a little; we need to make sure the ID is disambiguated (should be easy, just include the type name) and be a little more explicit when doing the auto-markup (have the marker know that it's marking a type with a generic parameter, so when it's marking the body it can tag that type explicitly as a parameter for the correct type, rather than just leaving it as a generic "cddl" autolink that might turn out ambiguous). I haven't checked, your code might already do something like this. Or we can just leave it out, since the scope is right there; autolinking is most useful when it's linking to something elsewhere so you don't have to hold a bunch of text in your head that's not visible. But the marker still needs to know that it's a parameter so it doesn't mark it as a cddl-type autolink (which'll then fail, or worse, link to a same-named global type), so I don't think this actually reduces the required work at all.
What I meant is that strings and barewords mean the same thing, afaict, so
Yeah, since we don't have that much CDDL in Bikeshedded specs so far (and none of it is using the new autolinking, so there will be some source editting anyway), just replacing the autolinking would be fine. I'd even group cddl-module under the And as long as the names are clear and distinct, I'm not afraid of multiplying the set of definition types. I should have been a lot more disciplined with naming from the start and used |
Hm, I think this model actually works fairly well. Values and keys are linkable as themselves, or with the full set of ancestor keys up to and including the type. This handles anonymous stuff pretty reasonably, consistent with what you already described. It's possible for it to produce clashing definitions, tho, if key names aren't provided. Like |
Third test file also update to showcase and test more nested shorthand features and ways to disambiguate apparent duplicates.
The code now properly handles references to generic parameters within a type definition, and does not get confused with types that have the same name and that may be defined elsewhere. It remains possible to end up with (insane) CDDL definitions such as: ``` T<T> = { T: T } ``` The IDs are correctly set but there is no good way to disambuigate `{^T/T^}` in such cases. Specs should not really need to reference generic parameters in practice. This update simply disables generic `cddl` links and CDDL shorthand syntax for now. In other words, the `T` generic parameter can only be referenced through `<a cddl-parameter for=T>T</a>`, while `{^T/T^}` targets the `T` key under the `T` type. This could be improved later on if references to generic parameters turn out to be useful in spec prose.
Yes, I added a few cases to
Argh. Indeed. That's still to be addressed.
The code was doing some of that but somewhat badly... Logic now updated. The IDs were scoped to the underlying type, not a problem per se. The code should now correctly detect references to generic parameters within a definition. I added test cases in tests/cddl004.bs. I took the liberty to disable generic
... and there would be no good way to disambiguate between a reference to the generic parameter and a reference to the
Understood. Now done in refs/utls.py, with tests in tests/cddl003.bs :) So, still on the TODO list:
|
Code now detects when a CDDL block contains anonymous types that would create duplicates. It dies when a key or a value is defined twice for the same CDDL type, preventing any reference to the term. It warns when the duplicate is for definitions that are not of the same type. For example, an error is returned for: ``` err = [ "dupl", "dupl" ] ``` ... while the following only returns an error because it remains possible to reference individual terms by making the link type explicit: ``` warn = [ ( dupl: "val" ), "dupl" ] ```
This create a new `cddl-module` dfn type to turn the beskope markup pattern for defining CDDL modules into a more usual mechanism. The shorthand notation can be used to reference CDDL modules as well.
That's a more natural place to put the logic and it simplifies things a bit as well (no need to deal with additional linking texts).
Webref now contains the prelude types defined in RFC 8610, making it possible to link to them automatically from other specs. Note: The updated tests are correct... provided that `spec-data/readonly` gets updated first through `bikeshed debug --refresh-data` (not done as part of this commit to keep updates separated)
Data in `spec-data/readonly` needs to be refreshed for definitions of CDDL types defined in the standard prelude in RFC 8610 to start becoming available for cross-referencing purpose. In the meantime, this update rebases the tests that include CDDL dfns. Build typically reports link errors because of the unavailable RFC 8610 dfns. These errors will disappear once data in `spec-data/readonly` gets refreshed and CDDL tests rebased again.
These should now be done! One thing is that, for links to RFC8610 to start working:
I didn't do the above as part of this PR because refreshing Side note: I ended up treating barewords and strings within the |
I'm creating this pull request as a draft because there are still a couple of things missing. Before I finalize the PR, I'd appreciate feedback on whether this does what people want for #2072, and whether it does it in a way that works for everyone. See below for more specific questions.
TL;DR
This adds:
{^foo^}
Implementation notes
I copied most of the logic from the code used to process IDL blocks.
As opposed to IDL definitions, CDDL definitions are not exported by default because CDDL is typically not imported/exported between specs (there is ongoing work on a CDDL module structure at IETF, but that would be done through explicit "import" statements in any case).
Support for CDDL definitions means 4 new definition types get introduced:
cddl-type
: roughly the equivalent of an IDL interface. Type definitions do not have adata-dfn-for
attribute.cddl-key
: roughly the equivalent of an IDL attribute. Key definitions always have adata-dfn-for
attribute that links back to a CDDL type.cddl-value
: roughly the equivalent of an enum-value in IDL. Value definitions always have adata-dfn-for
attribute that links back to a CDDL type or to a CDDL key.cddl-parameter
: used for generic parameter names (no known spec uses CDDL generics for the time being).The definition types are prefixed with
cddl-
to avoid collision with other types used for other purpose (e.g.,value
in CSS).Once we start having CDDL definitions in the cross-references database, it may become useful to add a fifth type
cddl-operator
to cover and link operators that may appear in definition types such as.default
or.within
and link back to their definition in RFC 8610.The code collects CDDL definitions to produce a CDDL index at the end of the spec. To accommodate specs like WebDriver BiDi that define two sets of CDDL (remote end and local end), it is possible to associate CDDL definitions with one or mode CDDL modules:
<dfn>
with adata-cddl-module
attribute set to a shortname for the CDDL module.data-cddl-module
attribute set to a comma-separated list of CDDL module shortnames they belong to. If a CDDL block does not have that attribute, the code considers it is defined in all CDDL modules.For example:
The CDDL index is split per module, using the
<dfn>
text as title for each module.Note: Even when modules are used, CDDL definitions in a spec are part of a single namespace, meaning that a
foo
rule cannot be defined differently within a single spec for two different CDDL modules. I believe that's fine for all specs that define CDDL, and that keeps the autolinking logic simple.CDDL parsing is done through a hand-made CDDL parser that follows the CDDL
grammar defined in RFC 8610, currently available as tidoust/cddlparser.
That parser started as a port from the
cddl
Node.js parser but has now deviated significantly from it to stay closer to the grammar, and allow re-serialization and marking up of the tree in a way that preserves whitespaces and comments. I tested the parser against CDDL extracts from IETF RFCs and a few W3C specs.To ease authoring, a new shorthand notation gets introduced to reference CDDL:
{^foo^}
is an autolink to a CDDL definition. FWIW, the shorthand was chosento mean "shortcut for an autolink to CDDL code" on the grounds that:
{}
would hint at a code block (as for IDL blocks)^
means "cut" in CDDL. Granted, the notion of "cut" in CDDL has nothing to do with a shorthand. Anyway.CDDL type definitions can become somewhat arbitrarily convoluted, the code does not attempt to be too smart and won't autolink definitions that are too complex.
CDDL defines a number of types in the standard prelude, such as
int
,tstr
, orbool
. There is no more specific anchor that may be used and the code simply does not link these types back to RFC 8610 for now.Still missing
Before the PR can be merged, I still need to:
Publish the parser as a Pypi package (once I've figured out how to do that ;)) so that it may be properly imported. The proposed code won't work on any other computer than mine for now due to the use of a local import. This problem also explains current linter failures.Edit:cddlparser
package published and tested with Python >=3.9Update the doc to reflect the CDDL capabilities.Edit: done in last commit(Editorial updates may also be needed afterwards to specs that currently define CDDL. I'm happy to help with that.)
Questions
cddl
type without creating more specific ones? (CDDL really is all about "types", the nuance between different constructs is not always super explicit)"browser.close"
in WebDriver BiDi). Is it too much?data-cddl-module
mechanism to split CDDL per module in the CDDL index suitable?