-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New work item: crate r2c2_iri
#5
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,14 @@ | |||
[package] | |||
name = "r2c2_iri" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems most crate are using -
instead of _
nowdays hence r2c2-iri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at crates.io, it does not look like one is largely dominating the other...
I personally prefer the underscore (_
) to keep the name of the crate consistent with how it is named in code.
But I won't die on that hill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this seems to be unclear. My bad. I tend to prefer -
for consistency with the keys in Cargo.toml I won't die on this hill either, let's choose something and stick with it.
Thank you for this! I am not sure this is the first work item I would like to tackle. Indeed, IRI resolution and validation seems to me mostly an implementation concern that is not much exposed in public APIs. Hence, I am not sure it will be a key enabler for interoperability. However, it has the advantage of being fairly self contained. On IRI resolution, it's sadly a bit of unspecified minefield between the RDF approach that is "do not change IRI" and the web browser approach "normalize everything". For example, should resolving I would love to avoid the dependency on regex because it's quite heavyweight. If we move this forward, maybe we can start with regex and then work on a handwritten implementation that might be even faster because it would skip the regex parsing and validator construction cost. |
5b8e665
to
27c2ec1
Compare
@Tpt thanks for your comments. I agree that IRI resolution may be left to implementations. On the other hand, the common API will need to return "something that is a valid IRI". We could either use a trait for that (something like Seems to me that the 2nd option makes it a bit easier on users (i.e. developers consuming RDF crates via the R2C2 API), and should be relatively straightforward and uncontroversial (but I may be overly optimistic here).
Is it really unspecified, or is it just that WHATWG and browsers are trying to push an algorithm that is subtly different from what RFC3986 says? (honest question, I don't have the answer)
No problem with your proposed 2-staged approach. |
This is an interesting topic. If we enforce in the API that IRI must be valid then it means that code will have to do IRI validation in a lot of places or rely on an
There is indeed the WHATWG URL standard that differs from RFC3986 by mandating normalisation of escape sequence and allowing some IRIs processing invalid according to RFC3987 (resolving relative file:// URLs with windows path's I was more thinking about small issues in RDF and its syntaxes like Turtle that mandate to not touch IRI at all except when doing relative IRI reference resolution that affects also absolute IRIs. See this issue for example. Sorry, I did not reminded that RFC3986 and forgot it always mandate dot segment removal when parsing relative IRIs. Anyway, I tend to think our library can follow RFC3986 resolution algorithm closely and leave this issue up to parser implementations. |
Yes. Either the text is known to be a valid IRI, and
I can sympathize with a specific implementation going down that path (that's what mine does, to some extent). But for a common API aiming interoperability, I'd rather stricly follow the standard... |
Make sense! Agreed! |
Thinking a little more about this... If this crate does not include IRI resolution, and is limited to a wrapper type guaranteeing valid IRI syntax, then it may not make sense to have a separate crate for this: it will be a very small crate, and the common API will need other similar wrappers (e.g. for language tags), so it might just make sense to bundle the IRI wrapper in a bigger crate for terms... |
This is an interesting topic. If we merge it into the bigger crate for terms then it means that the possible IRI resolution library will need to depend on the term library. But I am not sure it's a big deal, outside of RDF usage I guess people would be much better off with the |
This is a proposal for new work item. Please provide feedback (as PR reviews, comments or 👍/👎) by 2025-03-31
This "utility" crate is meant to provide:
str
with the guarantee that it is a valid IRI / IRI referenceI have published an analysis of 4 similar crates (including my own, sophia_iri), and came up with a number of lessons learned, in order to inform the design choices of
r2c2_iri
. Full disclosure, the conclusions lean toward the design choices of sophia_iri -- in fact, I had performed a preliminary version of this analysis while working on Sophia, so this is not a coincidence.edited: make the second bullet optional, following @Tpt's comment