Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CDDL marking and linking #2977

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Support CDDL marking and linking #2977

wants to merge 20 commits into from

Conversation

tidoust
Copy link
Contributor

@tidoust tidoust commented Dec 3, 2024

I'm creating this pull request as a draft because there are still a couple of things missing. Before I finalize the PR, I'd appreciate feedback on whether this does what people want for #2072, and whether it does it in a way that works for everyone. See below for more specific questions.

TL;DR

This adds:

  1. CDDL autolinking capabilities (and highlighting but that was sort of already the case).
  2. A new shorthand notation for CDDL autolinks: {^foo^}
  3. A CDDL index, generated at the end of the spec, split per CDDL module when needed.

Implementation notes

I copied most of the logic from the code used to process IDL blocks.

As opposed to IDL definitions, CDDL definitions are not exported by default because CDDL is typically not imported/exported between specs (there is ongoing work on a CDDL module structure at IETF, but that would be done through explicit "import" statements in any case).

Support for CDDL definitions means 4 new definition types get introduced:

  • cddl-type: roughly the equivalent of an IDL interface. Type definitions do not have a data-dfn-for attribute.
  • cddl-key: roughly the equivalent of an IDL attribute. Key definitions always have a data-dfn-for attribute that links back to a CDDL type.
  • cddl-value: roughly the equivalent of an enum-value in IDL. Value definitions always have a data-dfn-for attribute that links back to a CDDL type or to a CDDL key.
  • cddl-parameter: used for generic parameter names (no known spec uses CDDL generics for the time being).

The definition types are prefixed with cddl- to avoid collision with other types used for other purpose (e.g., value in CSS).

Once we start having CDDL definitions in the cross-references database, it may become useful to add a fifth type cddl-operator to cover and link operators that may appear in definition types such as .default or .within and link back to their definition in RFC 8610.

The code collects CDDL definitions to produce a CDDL index at the end of the spec. To accommodate specs like WebDriver BiDi that define two sets of CDDL (remote end and local end), it is possible to associate CDDL definitions with one or mode CDDL modules:

  1. Any CDDL module must be defined somewhere in the spec with a <dfn> with a data-cddl-module attribute set to a shortname for the CDDL module.
  2. CDDL blocks must add a data-cddl-module attribute set to a comma-separated list of CDDL module shortnames they belong to. If a CDDL block does not have that attribute, the code considers it is defined in all CDDL modules.

For example:

The <dfn data-cddl-module="local">local end definitions</dfn> contain foo.
The <dfn data-cddl-module="remote">remote end definitions</dfn> contain bar.

<xmp class=cddl data-cddl-module=local>
LocalCommand = {
  param: common
}
</xmp>

<xmp class=cddl data-cddl-module=remote>
RemoteCommand = {
  param: common
}
</xmp>

<xmp class=cddl data-cddl-module="local,remote">
common = any
</xmp>

The CDDL index is split per module, using the <dfn> text as title for each module.

Note: Even when modules are used, CDDL definitions in a spec are part of a single namespace, meaning that a foo rule cannot be defined differently within a single spec for two different CDDL modules. I believe that's fine for all specs that define CDDL, and that keeps the autolinking logic simple.

CDDL parsing is done through a hand-made CDDL parser that follows the CDDL
grammar defined in RFC 8610, currently available as tidoust/cddlparser.

That parser started as a port from the cddl Node.js parser but has now deviated significantly from it to stay closer to the grammar, and allow re-serialization and marking up of the tree in a way that preserves whitespaces and comments. I tested the parser against CDDL extracts from IETF RFCs and a few W3C specs.

To ease authoring, a new shorthand notation gets introduced to reference CDDL:
{^foo^} is an autolink to a CDDL definition. FWIW, the shorthand was chosen
to mean "shortcut for an autolink to CDDL code" on the grounds that:

  • {} would hint at a code block (as for IDL blocks)
  • ^ means "cut" in CDDL. Granted, the notion of "cut" in CDDL has nothing to do with a shorthand. Anyway.

CDDL type definitions can become somewhat arbitrarily convoluted, the code does not attempt to be too smart and won't autolink definitions that are too complex.

CDDL defines a number of types in the standard prelude, such as int, tstr, or bool. There is no more specific anchor that may be used and the code simply does not link these types back to RFC 8610 for now.

Still missing

Before the PR can be merged, I still need to:

  • Publish the parser as a Pypi package (once I've figured out how to do that ;)) so that it may be properly imported. The proposed code won't work on any other computer than mine for now due to the use of a local import. This problem also explains current linter failures. Edit: cddlparser package published and tested with Python >=3.9
  • Update the doc to reflect the CDDL capabilities. Edit: done in last commit

(Editorial updates may also be needed afterwards to specs that currently define CDDL. I'm happy to help with that.)

Questions

  • Does the overall approach look sound?
  • What about the new CDDL definition types? Should we rather only have a generic cddl type without creating more specific ones? (CDDL really is all about "types", the nuance between different constructs is not always super explicit)
  • It seemed useful to autolink enumeration values as done for IDL, but then there's nothing that distinguishes an enumeration from a single literal text value in CDDL, so the code will end up autolinking single values as well (things like the "browser.close" in WebDriver BiDi). Is it too much?
  • What about the new shorthand syntax? Should another shorthand be used? Should we start without a shorthand?
  • Should standard types link back to the prelude section in RFC 8610? Ideally, they would become available as exported definitions in the cross-references database, but again the problem is that there is no specific anchor in the RFC for them. Having links back to the RFC would also add it to the list of normative references, which would be a good thing as well.
  • Is the data-cddl-module mechanism to split CDDL per module in the CDDL index suitable?
  • Regarding the parser, would it be a good idea to move it under the speced organization on GitHub (provided the Speced CG agrees with it)? I ended up writing it because I didn't see any immediate way to proceed without a parser but then I'm neither good at writing Python code nor an expert in CDDL, so I'd happily share that mostly-shaven yak ;) I'm happy to maintain the code though.
  • Anything I forgot?

This makes Bikeshed process CDDL blocks à la `<pre class=cddl>` as described
in speced#2072 to:
- add highlighting (done by Pygments, the code merely sets the right class)
- wrap terms in `<dfn>` and `<a>` automatically, making it possible to
define them in, or reference them from, the rest of the prose.

Most of the logic is copied from the logic used to process IDL blocks.

As opposed to IDL definitions, CDDL definitions are not exported by default as
most CDDL definitions only apply to the underlying spec.

Support for CDDL definitions means 4 new definition types get introduced:
- `cddl-type`: roughly the equivalent of an IDL interface. Type definitions do
not have a `data-dfn-for` attribute.
- `cddl-key`: roughly the equivalent of an IDL attribute. Key definitions
always have a `data-dfn-for` attribute that links back to a CDDL type.
- `cddl-value`: roughly the equivalent of an enum-value in IDL. Value
definitions always have a `data-dfn-for` attribute that links back to a CDDL
type or to a CDDL key.
- `cddl-parameter`: used for generic parameter names (noting that no known spec
uses CDDL generics for the time being).

The definition types are prefixed with `cddl-` to avoid collision with other
types used for other purpose (e.g., `value` in CSS).

The code also collects CDDL definitions to produce a CDDL index at the end of
the spec. To accommodate specs like WebDriver BiDi that define two sets of
CDDL (remote end and local end), a mechanism gets added to associate CDDL
definitions with a given module:
1. The CDDL module must be defined with a `<dfn>` with a `data-cddl-module`
attribute set to a shortname for the CDDL module
2. CDDL blocks must add a `data-cddl-module` attribute set to a comma-separated
list of CDDL module shortnames they belong to. If a CDDL block does not have
that attribute, the code considers it is defined in all CDDL modules.

The index is split per module, using the `<dfn>` text as title for each module.

Note: Even when modules are used, CDDL definitions in a spec are part of the
same namespace, meaning a `foo` rule cannot be defined differently within a
single spec for two different CDDL modules.

CDDL parsing is done through a hand-made CDDL parser that follows the CDDL
grammar defined in RFC 8610, currently sitting under:
 https://github.com/tidoust/cddlparser

To ease authoring, a new shorthand notation gets introduced to reference CDDL:
`{^foo^}` is an autolink to a CDDL definition. FWIW, the shorthand was chosen
to mean "shortcut to CDDL code" on the grounds that:
- `{}` indicates a code block (for IDL)
- `^` means "cut" in CDDL

CDDL type definitions can become somewhat convoluted, the code does not
attempt to be too smart and won't autolink definitions that are too complex.
`mypy` started reporting type issues as it can now look into the `cddlparser`
package. This update bumps the version of `cddlparser` to get better typing
info, and adds assertions when appropriate.
Previous version was not fully compatible with Python 3.9 and Python 3.10. The
`cddlparser` project now runs tests under multiple versions of Python to
guarantee compatibility.
@tidoust
Copy link
Contributor Author

tidoust commented Dec 12, 2024

@tabatkins, coding-wise, this should be ready for review.

I published the CDDL parser as a Pypi package, adapted it to run with Python >= 3.9, and linted the code here. The parser is also more aligned with the CDDL grammar and reports an error when the CDDL syntax is invalid, which seems a good thing.

FYI, there's an issue right now with the WebDriver BiDi in that it uses a <pre class="cddl"> block to reference the CCDL rule EmptyResult defined elsewhere, creating an invalid CDDL block. That explains why the test update for WebDriver BiDi features a "CDDL syntax error" message.

If the PR looks reasonable, I'll update the documentation as well.

@tidoust tidoust requested a review from tabatkins December 12, 2024 12:33
New version returns more user-friendly error messages
@tidoust tidoust marked this pull request as ready for review January 6, 2025 16:49
@tabatkins
Copy link
Collaborator

Cool, I'm gonna delay reviewing this for a bit longer as I'm this close to pushing out the big shorthand parser rewrite, which this interacts with. I'll probably have to manually fix things up for that, but that's fine, I'm happy to do so.

@tabatkins
Copy link
Collaborator

Does the overall approach look sound?

Yup, looks reasonable overall. I'll need to rewrite the shorthand parser since that's totally changed now, but that's trivial.

What about the new CDDL definition types? Should we rather only have a generic cddl type without creating more specific ones? (CDDL really is all about "types", the nuance between different constructs is not always super explicit)

The question is just what we actually need to be linkable. Reading the RFC, I agree that "types" are definitely the main useful thing. They have a clear defining location and it's obvious that linking to their occurences in other structures is useful.

  • Some values have an obvious defining location, when they're immediately assigned to a type, like valid-strings = "foo" / "bar". These would be for=valid-strings, linkable with {^"foo"^} (possibly ambiguous) or {^valid-strings/"foo"^} (should be unambiguous).
  • Other values don't, as part of anonymous types, like in [("foo" / "bar")]. It's not immediately clear to me that we should be treating these as a defining location (what if they're used in multiple spots?), and I don't know how we'd give them a disambiguating "for" value. Are these useful to link to? Possibly we should just say "if you want it linkable, create a type for it"
  • I can't actually tell what generic parameters are or what they look like, from reading the RFC. It really just glosses over them. Is it useful to be able to link to them?
  • Keys seem reasonable to link, but there's some complexities. First, we should make sure that string and bare-words are canonicalized together. Second, what do we do about non-string/numeric keys? They can be other types, or arbitrarily complex anonymous types. Maybe just treat those as unlinkable?

It seemed useful to autolink enumeration values as done for IDL, but then there's nothing that distinguishes an enumeration from a single literal text value in CDDL, so the code will end up autolinking single values as well (things like the "browser.close" in WebDriver BiDi). Is it too much?

Addressed above. I lean towards anonymous values/enumerations not getting dfns.

What about the new shorthand syntax? Should another shorthand be used? Should we start without a shorthand?

The shorthand you're using seems reasonable. The important thing is just that we ideally want to avoid having to manually specify the specific link type as much as possible. It looks like cddl-types will always be idents, while cddl-keys, cddl-values, and cddl-params are always going to be "for" a cddl-type. That's a decent bit of possible collision, but it's possible that, like IDL, it's rare for there to be actual name re-use (especially since they're generally spec-local), so it should work fine by default.

Should standard types link back to the prelude section in RFC 8610? Ideally, they would become available as exported definitions in the cross-references database, but again the problem is that there is no specific anchor in the RFC for them. Having links back to the RFC would also add it to the list of normative references, which would be a good thing as well.

Yes, auto-linking them to something in the RFC is good, same as we autolink all the built-ins for WebIDL to the WebIDL spec.

Is the data-cddl-module mechanism to split CDDL per module in the CDDL index suitable?

Yes, for assigning blocks of cddl code to a module. I'd prefer the actual module definitions just be a cddl-module dfn type, rather than a bespoke markup pattern.

Regarding the parser, would it be a good idea to move it under the speced organization on GitHub (provided the Speced CG agrees with it)? I ended up writing it because I didn't see any immediate way to proceed without a parser but then I'm neither good at writing Python code nor an expert in CDDL, so I'd happily share that mostly-shaven yak ;) I'm happy to maintain the code though.

It doesn't have to go there, but it's not bad to do so either. Then I could poke at it more easily, too. ^_^

@tidoust
Copy link
Contributor Author

tidoust commented Jan 14, 2025

Other values don't, as part of anonymous types, like in [("foo" / "bar")]. It's not immediately clear to me that we should be treating these as a defining location (what if they're used in multiple spots?), and I don't know how we'd give them a disambiguating "for" value. Are these useful to link to? Possibly we should just say "if you want it linkable, create a type for it"

I'm not 100% sure where "anonymous" starts. Trying to distinguish relevant cases, given:

anon = [("foo" / "bar")]
mytype = {
  anon: "foo" / "thing"
}

The code currently creates a defining location for values in both of them. In the first case, it ignores the array and parentheses enclosings because they do not introduce ambiguity from a linking perspective. In the second case, it creates values that are for the anon key, knowing that the key is, in turn, for the mytype type.

To disambiguate, one would need to write {^anon/"foo"^} to target the first definition and {^mytype/anon/"foo"^} to target the second definition. In particular, the second definition is two-level-deep, and can either be referenced with no level (as in {^"thing"^}) or with both levels (as in {^mytype/anon/"thing"^}). {^anon/"thing"^} would result in a linking error (which seems good, as it means {^anon/"foo"^} is not ambiguous).

Should the code avoid defining locations in both cases? Only the second one? Or were you thinking about something else?

This would restrict cddl-value definitions to cases where the values are for a cddl-type definition. For example, the second case appears in WebDriver-BiDi in the definition of browser.ClientWindowInfo/state:

browser.ClientWindowInfo = {
  ...
  state: "fullscreen" / "maximized" / "minimized" / "normal",
  ...
}

The state enumeration values would then not be linkable. But it's indeed easy to define an enumeration type if linkability is needed.

I can't actually tell what generic parameters are or what they look like, from reading the RFC. It really just glosses over them. Is it useful to be able to link to them?

There aren't many examples of generic parameters used in specifications. I don't think they are being used in any W3C specification for now in particular. I see one occurrence in RFC9581:

suffix-values = one-or-more<suffix-value>
one-or-more<T> = T / [ 2* T ]
suffix-value = tstr ...

I suspect linking is only going to be useful within the definition itself (that is, to link back T in T / [ 2* T ] to <T> so that it's clear that it is a template type name. If the comment is "let's start without them", I'm certainly fine with that ;)

Keys seem reasonable to link, but there's some complexities. First, we should make sure that string and bare-words are canonicalized together. Second, what do we do about non-string/numeric keys? They can be other types, or arbitrarily complex anonymous types. Maybe just treat those as unlinkable?

Strings and barewords are supported in the same way:

bareword = {
  bare: [+ float],
  ? "bare-word": int,
  "stringkey" ^ => int,
}

... would result in defining key locations which can be referenced through {^bareword/bare^}, {^bareword/bare-word^}, and {^bareword/stringkey^}.

Non-string keys, and in particular numeric keys, are treated as non-linkable. Some specs use them extensively, such as the Open Screen Protocol, but I cannot think of a good way to create useful defining locations (the use of integer keys is for optimization purpose, to talk about them in the prose, the spec "names" them through a CDDL comment).

Yes, for assigning blocks of cddl code to a module. I'd prefer the actual module definitions just be a cddl-module dfn type, rather than a bespoke markup pattern.

No problem introducing a new dfn type. FWIW, I had stuck to "dfn" for three reasons:

  1. WebDriver-BiDi references the module definition each time it defines a block of CDDL, so there are ~100 occurrences in the spec of references such as [=local remote definition=]. These would have to become <a cddl-module>local remote definition</a>, slightly less convenient to write. But replacement can be automated, no big deal.
  2. WebDriver-BiDi exports these definitions so that specs that extend WebDriver BiDi can attach CDDL definitions to either of these modules. These specs would also need to be updated, but they aren't many of them in any case (and an update will be useful to leverage CDDL support in any case).
  3. I felt wary of introducing a new definition type for something that's probably not going to grow much. But then, who knows, we may end up with hundreds of CDDL modules in the end...

The CDDL index contained the "local" module twice as a result.
This pulls in the latest changes from `main` and resolves conflicts that arose.
In particular, the latest changes include a re-write of the parsers that
required to also partially re-write the CDDL shorthand parsing.

In practice, as for other changes, CDDL shorthand parsing closely follows the
logic used for IDL shorthand parsing.
@tidoust
Copy link
Contributor Author

tidoust commented Jan 14, 2025

Cool, I'm gonna delay reviewing this for a bit longer as I'm this close to pushing out the big shorthand parser rewrite, which this interacts with. I'll probably have to manually fix things up for that, but that's fine, I'm happy to do so.

I thought I'd do it as a way to force me to look into how things work. I believe I managed to adjust the code, at least I get the output I expect ;) The test that fails seems unrelated to my updates (and also fails in main).

@tabatkins
Copy link
Collaborator

The code currently creates a defining location for values in both of them. In the first case, it ignores the array and parentheses enclosings because they do not introduce ambiguity from a linking perspective. In the second case, it creates values that are for the anon key, knowing that the key is, in turn, for the mytype type.

These both seem reasonable to me.

In particular, the second definition is two-level-deep, and can either be referenced with no level (as in {^"thing"^}) or with both levels (as in {^mytype/anon/"thing"^}). {^anon/"thing"^} would result in a linking error (which seems good, as it means {^anon/"foo"^} is not ambiguous).

While this seems reasonable for this case, what about when dfns are 3-deep or more? Like:

t1 {
  t2: {
    t3: v1;
  }
}

Would v1 only be linkable as v1 and t1/t2/t3/v1?

The state enumeration values would then not be linkable. But it's indeed easy to define an enumeration type if linkability is needed.

Yeah, this is the solution if we're uncomfortable with the inferences, especially for deep nesting.

I see one occurrence in RFC9581:

Okay, that's how I thought they'd probably work.

So, the issue is that the RFC doesn't seem to explicitly define the scope of a generic parameter name. I presume it's treated as a type with lexical scope covering the definition only, so two different types could use the same parameter name and it's not ambiguous. This complicates things a little; we need to make sure the ID is disambiguated (should be easy, just include the type name) and be a little more explicit when doing the auto-markup (have the marker know that it's marking a type with a generic parameter, so when it's marking the body it can tag that type explicitly as a parameter for the correct type, rather than just leaving it as a generic "cddl" autolink that might turn out ambiguous). I haven't checked, your code might already do something like this.

Or we can just leave it out, since the scope is right there; autolinking is most useful when it's linking to something elsewhere so you don't have to hold a bunch of text in your head that's not visible. But the marker still needs to know that it's a parameter so it doesn't mark it as a cddl-type autolink (which'll then fail, or worse, link to a same-named global type), so I don't think this actually reduces the required work at all.

Strings and barewords are supported in the same way:

What I meant is that strings and barewords mean the same thing, afaict, so {^bare^} and {^"bare"^} should both work and link to the same thing. This just requires the dfn to repeat itself in both forms, or linkTextVariations() (in refs/utils.py) to handle the variation for you. (I already do this for a few IDL things.)

No problem introducing a new dfn type.

Yeah, since we don't have that much CDDL in Bikeshedded specs so far (and none of it is using the new autolinking, so there will be some source editting anyway), just replacing the autolinking would be fine. I'd even group cddl-module under the {^^} syntax, personally.

And as long as the names are clear and distinct, I'm not afraid of multiplying the set of definition types. I should have been a lot more disciplined with naming from the start and used css-*, idl-*, etc names...

@tabatkins
Copy link
Collaborator

Would v1 only be linkable as v1 and t1/t2/t3/v1?

Hm, I think this model actually works fairly well. Values and keys are linkable as themselves, or with the full set of ancestor keys up to and including the type. This handles anonymous stuff pretty reasonably, consistent with what you already described.

It's possible for it to produce clashing definitions, tho, if key names aren't provided. Like foo = ["bar" , "bar" / "baz"] would produce two value definitions for foo/"bar". I guess we'd need to catch this sort of thing and suggest moving one or both values into a named type.

Third test file also update to showcase and test more nested shorthand features
and ways to disambiguate apparent duplicates.
The code now properly handles references to generic parameters within a type
definition, and does not get confused with types that have the same name and
that may be defined elsewhere.

It remains possible to end up with (insane) CDDL definitions such as:

```
T<T> = {
  T: T
}
```

The IDs are correctly set but there is no good way to disambuigate `{^T/T^}` in
such cases. Specs should not really need to reference generic parameters in
practice. This update simply disables generic `cddl` links and CDDL shorthand
syntax for now. In other words, the `T` generic parameter can only be
referenced through `<a cddl-parameter for=T>T</a>`, while `{^T/T^}` targets the
`T` key under the `T` type.

This could be improved later on if references to generic parameters turn out to
be useful in spec prose.
@tidoust
Copy link
Contributor Author

tidoust commented Jan 15, 2025

Hm, I think this model actually works fairly well. Values and keys are linkable as themselves, or with the full set of ancestor keys up to and including the type. This handles anonymous stuff pretty reasonably, consistent with what you already described.

Yes, I added a few cases to tests/cddl003.bs and confirm that works.

It's possible for it to produce clashing definitions, tho, if key names aren't provided. Like foo = ["bar" , "bar" / "baz"] would produce two value definitions for foo/"bar". I guess we'd need to catch this sort of thing and suggest moving one or both values into a named type.

Argh. Indeed. That's still to be addressed.

So, the issue is that the RFC doesn't seem to explicitly define the scope of a generic parameter name. I presume it's treated as a type with lexical scope covering the definition only, so two different types could use the same parameter name and it's not ambiguous. This complicates things a little; we need to make sure the ID is disambiguated (should be easy, just include the type name) and be a little more explicit when doing the auto-markup (have the marker know that it's marking a type with a generic parameter, so when it's marking the body it can tag that type explicitly as a parameter for the correct type, rather than just leaving it as a generic "cddl" autolink that might turn out ambiguous). I haven't checked, your code might already do something like this.

The code was doing some of that but somewhat badly... Logic now updated. The IDs were scoped to the underlying type, not a problem per se. The code should now correctly detect references to generic parameters within a definition. I added test cases in tests/cddl004.bs.

I took the liberty to disable generic cddl links and shorthand notation to reference definitions of type cddl-parameter for now. First, that's probably hardly ever going to be useful. But more importantly, it seems doable to end up with a definition such as:

T<T> = {
  T: T
}

... and there would be no good way to disambiguate between a reference to the generic parameter and a reference to the T key. In short, {^T/T^} now targets the T key defined in the T type, and one needs to write <a cddl-parameter for=T>T</a> to target the generic parameter (which, again, should not be needed much). We can certainly improve the situation later on when the need arises!

What I meant is that strings and barewords mean the same thing, afaict, so {^bare^} and {^"bare"^} should both work and link to the same thing. This just requires the dfn to repeat itself in both forms, or linkTextVariations() (in refs/utils.py) to handle the variation for you.

Understood. Now done in refs/utls.py, with tests in tests/cddl003.bs :)

So, still on the TODO list:

  1. Detect and handle duplicate keys/values
  2. Create a cddl-module dfn type
  3. Add links to RFC8610 for basic types

Code now detects when a CDDL block contains anonymous types that would create
duplicates. It dies when a key or a value is defined twice for the same CDDL
type, preventing any reference to the term. It warns when the duplicate is for
definitions that are not of the same type.

For example, an error is returned for:

```
err = [ "dupl", "dupl" ]
```

... while the following only returns an error because it remains possible to
reference individual terms by making the link type explicit:

```
warn = [ ( dupl: "val" ), "dupl" ]
```
This create a new `cddl-module` dfn type to turn the beskope markup pattern
for defining CDDL modules into a more usual mechanism. The shorthand notation
can be used to reference CDDL modules as well.
That's a more natural place to put the logic and it simplifies things a bit as
well (no need to deal with additional linking texts).
Webref now contains the prelude types defined in RFC 8610, making it possible
to link to them automatically from other specs.

Note: The updated tests are correct... provided that `spec-data/readonly` gets
updated first through `bikeshed debug --refresh-data` (not done as part of this
commit to keep updates separated)
Data in `spec-data/readonly` needs to be refreshed for definitions of CDDL
types defined in the standard prelude in RFC 8610 to start becoming available
for cross-referencing purpose.

In the meantime, this update rebases the tests that include CDDL dfns. Build
typically reports link errors because of the unavailable RFC 8610 dfns. These
errors will disappear once data in `spec-data/readonly` gets refreshed and
CDDL tests rebased again.
@tidoust
Copy link
Contributor Author

tidoust commented Jan 20, 2025

So, still on the TODO list:

  1. Detect and handle duplicate keys/values
  2. Create a cddl-module dfn type
  3. Add links to RFC8610 for basic types

These should now be done!

One thing is that, for links to RFC8610 to start working:

  • Data in spec-data/readonly needs to be updated with bikeshed debug --refresh-data so that RFC8610 dfns start appearing in the cross-references database used in tests.
  • CDDL tests tests/cddl* need to be rebased once spec-data/readonly has been refreshed, which should drop all link errors, and effectively add links to the standard prelude where needed.
  • Spec tests for specs that define CDDL need to be rebased as well (webdriver-bidi, web-bluetooth, permissions) for the same reason.

I didn't do the above as part of this PR because refreshing spec-data/readonly seems to be something that you do separately.

Side note: I ended up treating barewords and strings within the data-lt attribute instead of in refs/utils.py. I initially thought that would ease the handling of duplicates but I ended up doing that handling at a different level, so there's really no problem going back to refs/utils.py if that seems preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants