-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISUPPORT UTF8ONLY is not backwards-compatible. #456
Comments
The UTF8ONLY token only exists to let clients detect that the server is UTF-8 only. It is backwards compatible with the existing situation where servers that require UTF-8 silently break with clients which are not configured to use UTF-8. The spec does not specify any required method for handling clients that send non-UTF-8. It's entirely legal under the spec for implementations to transcode any non-UTF-8 to UTF-8 if they want. |
Such servers aren't really following the spirit of the backwards-compatibility principle, so it seems harmful to endorse that approach in IRCv3. The way it appears now it looks like a desired and encouraged part of the specification - ideally it would at least say that servers SHOULD not drop the client for sending non-UTF8, though they may ignore individual protocol messages. |
Ideally such servers would always handle these cases without disconnecting the client. However, given the amount of discussion that'd likely result from trying to specify one specific way of handling these cases, I thought it'd be best to just let the servers handle it in whatever way they find appropriate.
Unfortunately we can't make this opt-in with a CAP, since servers that only accept UTF-8 traffic already exist and they need to transcode, reject, or in some other way handle non-UTF-8 traffic from clients in line with the definition written in the spec anyway.
Definitely makes sense to discourage disconnecting the client outright. I'll play with the language there and try to PR some alternative language that encourages that only as a last resort. Thanks for the note, much appreciated! |
It's a tricky issue, yeah. I think a compatibility break is inherent in the intent of the specification --- if a server implements the spec, it's never really going to interoperate acceptably with clients that use non-UTF8 encodings (even if you can robustly transcode input, the server will only emit UTF8, likely violating client expectations that the output encoding will agree with the input encoding). I agree with the suggestion that disconnecting the client altogether is unnecessarily aggressive and should probably be deprecated. (From the comment history on #432, it sounds like we were exploring it as the best way to get the end user's attention.) |
I'll bump this issue a year later- I agree, the concept of disconnecting a client over UTF8 seems heavy-handed and appears to be an option suggested in the UTF8ONLY spec. I would love to see this language be removed. |
I think this change gives a more accurate explanation of why this spec exists, and also removes the disconnection language entirely. Please let me know watcha think: https://gist.github.com/DanielOaks/02a60498e4be4ecb7d6be387eecb642a/revisions#diff-014869833613b58c7e37f5208548f4e64d8d0deb465a47d1db21da761158f143= |
I think the changes improve the document, and appreciate the removal of the language referencing disconnection as a server option. |
I'm OK with removing the disconnection language, but I don't like the other changes.
Is this true? I've always thought of |
Depends on your view of the protocol I guess. Some do see disallowing that as a protocol break, some responses to non-UTF-8 content (e.g. disconnecting the client) would prolly classify as a protocol break, and some don't see it as a protocol break. I guess in my view of that sentence, I'm kind of conflating the 'decode everything as UTF-8' approach that some software does as not following the 'traditional' treat-everything-as-octets-and-bytes direction, but I guess the token/stdreplies code themselves doesn't necessarily mean that 🤷 |
Put me in this camp :-) I found a better way to phrase my objection: the current spec language implies that non-UTF8 is legacy and UTF8 is preferred. I like this implication and I want to keep it. |
One of the guiding principles of IRCv3 appears to be backwards-compatibility - from the FAQ:
This is not usefully the case for the current design of the ISUPPORT UTF8ONLY specification, since clients that do not support the specification will happily send non-UTF8 and be disconnected for violating the protocol.
To be backwards-compatible, this should be opt-in with a CAP exchange. Once a client has ACK'd UTF8ONLY, it is reasonable to expect it not to send anything that violates the UTF8ONLY specification.
The text was updated successfully, but these errors were encountered: