Skip to content

Qualifications for identification as a "codec" #204

Open
@rvagg

Description

@rvagg

This question comes up very regularly, essentially what are the qualifications for being included in this table, but I'm particularly focusing on the "codec"s in here, IPLD and related.

Some recent discussions that point toward challenges in definitions:

  • Likecoin uses dag-cbor but wants to do content-routing with a new codec Add likecoin-iscn codec #200
  • SoftWare Heritage mostly uses git-raw but interop with their systems might benefit from new codecs Add SoftWare Heritage persistent IDentifiers #203 (also related to content routing)
  • Filecoin wanting to represent CommP/CommR/CommD as CIDs where there exist practical and logical difficulties in treating the codecs as "encoding formats" add filecoin commitment merkle root codecs #172 (lengthy discussion and includes a few other linked PRs)
  • Bitcoin Witness Commitment Add Bitcoin Witness Commitment (0xb2) #176 is a union of two 32-byte byte arrays, which is the arguably the same binary format as bmt although the logical format is slightly different because only one of these things is technically a hash, but it's still a 256-bit number. Similar is bitcoin-tx which either describes an actual transaction (including or excluding witness data) or a node in a binary merkle tree (bmt again!), but you can't disambiguate so there's some degree of nominative typing going on in using this. Same pattern is repeated for most of the other blockchains we have in there (well the Bitcoin forks at least).

How about the fact that all of the codecs we present could also be interpreted as raw? I liked @Ericson2314's thoughts related to this kind of question: ipld/specs#349 (comment)

Some excellent and clear thinking from @aschmahmann about nominative typing using codecs: ipld/specs#349 (comment)

There clearly exists a grey area here, and while we should avoid strong gatekeeping of the table where a contributor has greater expertise in their particular system than us, there's an educational role to play too because many people show up with requests that clearly don't fit the purpose of this table and the definitions of "codec" that we broadly share. It's valid to say "this is an incorrect use of multicodec / CID" where it clearly is. But what we need is better shared understanding of those "clear" boundaries.

Thoughts please!

(/cc @warpfork who isn't in the Assignees list)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions