Description
What is your use-case and why do you need this feature?
Generic Parsing of CBOR structures just like decodeFromXXX<JsonElement>(…)
is now becoming a must-have for CBOR, because the eIDAS2 regulation (commonly referred to as EU Digital Identity Wallet - EUDIW) mandates the use of ISO/IEC 18013-5:2021 (this format will also be referred to as ISO mDL). Note that the ISO standard is behind a paywall and not freely accessible, so this issue will only quote a very short part of it.
Detailed Technical Write-Up based on two concrete Examples
IssuerSignedItem
as per ISO/IEC 18013-5:2021
This data structure is used during the issuing process in the EUDIW context. Quoting ISO/IEC 18013-5:2021 Section 8.1:
RFC 7049, section 3.9 describes four rules for canonical CBOR. Three of those rules shall be implemented for all CBOR structures as follows:
- integers (major types 0 and 1) shall be as small as possible;
- the expression of lengths in major types 2 through 5 shall be as short as possible;
- indefinite-length items shall be made into definite-length items.
The fourth rule regarding sorting of map keys is not required.
This last bit is the culprit: Some properties of the IssuerSignedItem
(and their types) depend on another property. Would the fourth rule of canonicalisation be enforced, the type property would occur first and deserialisation would work. After all, if we know the type, we can choose a serialiser. Due to ISO mDL not enforcing this, the type could be the very last property encountered during deserialisation.
Why can't we try to parse every possible type as a cascade oftry-catch
blocks? The reason is that the types that occur in IssuerSignedItem
may be partially parsed before an error occurs. Hence, part of the bytes are already consumed and lost when a parsing error is thrown, so we cannot try to parse the property at hand as another type inside the catch
block.
The only possible solution to this problem is currently to rely on Obor, because it enables us to
decodeFromByteArray<CborObject>
- iterate over all properties inside a generic
CborObject
data structure - extract the type property
- choose a deserialiser based on the type
Why this is becoming a Must-Have
CIR 2024/2982 Article 5 (referencing its Annex), which is part of the eIDAS2 regulation, mandates the use of ISO/IEC 18013-5:2021.
Why is this relevant? The eIDAS2 regulation mandates every member state to implement an identity wallet solution that must be interoperable across the whole European Union. This is relevant right now as large-scale pilots are being carried out and the EU-wide go-live is set for 2026!
Without proper support, the default CBOR format provided here will be unfit to support digital identity wallet solutions with a target audience of hundreds of millions.
COSE Keys
Cose Key Parameters as per IANA registry for COSE Key Type Parameters use overlapping COSE labels for different data types (e.g. -1
could be k (bstr
), curve (int
/tstrs
), or n (bstr
)). The problem is that even when the type of a COSE key is known, (e.g. RSA or EC2), certain parameters can have different types under the same label (e.g. -3
could be of type bstr
or bool
).
With very careful, tedious manual try-catch parsing, it is still possible to work around the limitations of the current COSE parser by exploiting the fact that we have a one-byte lookahead that is not advanced, in case the type of the value that is supposed to be parsed next does not match the current byte in the byte stream. However, a slight change in the order of the try-catch - such as trying to first parse a property as a bstr
instead of an int
(this is a random example, it might be the other way around) - will consume bytes and make it impossible to recover from an error and try to parse a property as another type (just as it is the case for IssuerSignedItem
(see above).
Why this is becoming a Must-Have
COSE is mandatory for ISO mDL credentials, as it specifies the implementation details of the security layer (digital signatures of credentials and encryption, etc.). The current workaround is unsustainable and hard to maintain. It also may not cover all possible legal inputs. Some legal inputs could cause an irrecoverable situation.
Describe the solution you'd like
Merge Obor into upstream so support generic CBOR parsing. While the specifications are to blame for this mess, because they make single-pass parsing without lookahead impossible, all of those bad decisions are here to stay and will affect a potential user base of hundreds of millions by 2026 at the very latest. We either catch up or we won't be part of what is probably the single largest use case for CBOR yet, backed by a legally binding EU regulation.