-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments on URL-interop.md #10
Comments
Right, clearly wrong of me. That's virtually the same as 86. I took away that mistake.
I suppose that's so too. It got removed as well now when I cleaned up the scheme flaws.
Yes, as long as you mean 32bit numbers and you use the stock name resolver functions. The trickier part is the dotted numerical version that isn't 4 numerical fields. But still, that's not part of 86.
Both yes and no. When it comes to curl, the original approach was to only interfere where it had to and get everything else as far as it can. So you could send in illegal things in the URL and it would be used in the end anyway, and that could help users torture their servers to send crap other clients wouldn't. Over time that has turned out harder and a bit error-prone so we've had to make the parser stricter over time, but it still has a fairly lenient approach and the focus is that if you pass it a legal URL it should parse it and work it it. The illegal URLs are not always rejected (sort of garbage in, garbage out). But over time I think we're slowly rejecting more and more illegal URLs.
Right, but when you accept a white space as a part of a URL, you need something else or another character to specify that. In HTTP headers that other character is typically a newline. If it is within a I should avoid the use of "magic" there and say another method or another character.
I drowned in all the other things there when I looked previously but I agree that it looks fine. I've pushed a change now that links directly to the source json file. Thanks for all the feedback. I've done several commits now to clean up. |
I'm useless when it comes to anything non-ascii so I suppose that's why I'm extra confused by all these IDNA things. Are you saying that the TR46 document makes it clear to you how to encode IDN host names when doing name resolves and then works with everything, including German ß's? |
Yes.
If you mean "Is implementing that spec sufficient for achieving interoperability with every domain, TLD, and registar in the world", I don’t know. I assume Anne chose TR46 over alternatives because he thought it would provide better, if not perfect, interop. I just googled for what he wrote about this and found: https://annevankesteren.nl/2014/06/url-unicode
|
Then I would say that it isn't that clear to you either. A clear spec would specify the single algorithm that should be used. (And should doesn't then mean that everyone adheres to that spec, just like any other spec in the world.) |
You’re either misunderstanding or misrepresenting what I wrote. TWUS does specify a single algorithm that, in the opinion of its editors, should be used. My “I don’t know” was a response to your “works with everything”, in the sense that “everything” is an unbounded set of things and so that question can never be answered. No single person knows all the corner cases of every piece of software that exist in the world. However if and when we do find out that some aspect of TWUS doesn’t work with something, we can try and tweak TWUS to fix that problem. |
I’m reading it at commit 8636551.
IIRC the TWUS parser only accepts input without a scheme when there’s a base URL. The input is relative, in these cases.
86 has this grammar, which seems equivalent?
In this specific instance, I think "Non-web-browser" means anything that doesn’t also implement https://w3c.github.io/FileAPI/ since the difference between "basic URL parser" and "URL parser" is all about blob: URLs.
I think this is really not a big deal. It could just as well be 5 max, but 5 is arbitrary and less theoretically pleasing than http://www.catb.org/jargon/html/Z/Zero-One-Infinity-Rule.html
When I looked into it, it seemed hard to choose to not support it in such functions. (The most a program could do is recognize such "exotic" IPv4 syntax and reject them with a parse error, if it doesn’t want to resolve the IP address.)
It specified Unicode TR46, which fully defines algorithms independently of IDNA 2003 or 2008. (Though it is based on the Punycode RFC.)
Personal opinion: it sounds problematic to silently ignore part of the input?
For example in
<a href="…">
HTML syntax defines exactly where the value href attribute ends, so there is no need for magic.If URLs need to be found in the middle of a free-form paragraph of text without any markup, there’s a lot more magic (and heuristics) required than splitting on spaces. I think defining this does not belong in an URL spec.
Part (arguably the most important part) of this test suite has its test cases in a JSON file that can be used without JavaScript (and is in rust-url).
The text was updated successfully, but these errors were encountered: