-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that arbitrary unicode is allowed in user/room IDs and room aliases #1506
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
6215072
Clarify that arbitrary unicode is allowed in user/room IDs and room a…
tulir ec55c38
Merge branch 'main' into clarify-identifier-localparts
turt2live 459634f
Clarify historical ID set further
turt2live 97381fd
Apply suggestions from code review
tulir 9b75333
Clarify allowed control characters
tulir 05557b8
Merge branch 'main' into clarify-identifier-localparts
tulir 9facd55
Update content/appendices.md
tulir 01dd6df
Remove definition which isn't referenced anywhere
tulir 30dd7cf
Apply suggestions from code review
tulir File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Clarify that arbitrary unicode is allowed in user/room IDs and room aliases. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly disagree on this.
The original vintage 2014 Synapse implementation only allowed non URL quotable characters in user localparts when registering for an account. Unfortunately, federation did not apply the same check, alluded to by other comments here:
If you are playing silly games (removing validation checks on your server) we should NOT set the precedent that we are just going to accept your games imo, otherwise where do you draw the line? Some comments in this proposal mention about excluding the null byte, but why? Sure, postgres cannot represent it when used with TEXT columns, but hey I am using a modified Synapse which doesn't have this problem, so why are we allowing some silly games but not others?
The result is an inconsistent mess, and it was never designed to be that way. This isn't like other cases where "hey this is what synapse does, in the spec it goes" because in order to get the failure mode of unicode characters you need a malicious and/or buggy actor.
The consequences of making this the rule in the specification, and hence removing these checks in Dendrite, Conduit, et al is an increased risk of homograph attacks. I cannot and will not support increasing the attack surface of Matrix just because a few people back in 2014 removed validation and sent unicode user IDs into a room, which synapse accepted.
It is worth emphasising that room versions ARE NOT a get-out-of-jail-free card here, as user IDs are outside the scope of rooms. For example, the sliding sync proxy recently had an issue with unicode user IDs in device list changes. It's not hard to see how this can also be an issue with the user directory and to-device msgs, both of which sit outside of rooms.
Counter proposal: sorry folks with smiley poos as user localparts, you're going to be broken in the next release of Synapse, and we remove / subsequently ignore events with malformed user IDs aka what we should have done in 2014.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that this only really happens because of people modifying their servers, the result of enforcing those checks in one implementation and not in another can end up being outright disastrous for innocent users.
For example, if Dendrite enforces these checks but Synapse doesn't, then all it takes is for a Synapse user with an emoji localpart to make a power level change or perform any kind of power action to break the room for Dendrite users probably irreversibly.
Hell, they might not even have to perform a power action, because we might drop auth events (such as the user joining to begin with) or refuse to pull in prev events (from normal timeline events), which in turn causes us to run state resolution with a different set of input events (which can result in a different output state set or a complete state reset) or to accidentally propagate broken state to other servers when they ask us for
/state_ids
or/state
.This situation 100% sucks but Matrix only functions if implementations agree to handle these things in the same way, otherwise different implementations will never arrive at a consistent state. We already see this happen due to other corner cases and it just ends up feeling terrible for all users involved.
As for room versions, it is true that it's not a great solution to the problem due to the fact that it still leaks into device updates or other areas, but at least it's possible in a future room version to make sure that users with invalid localparts can't join the rooms to begin with. That is a huge step towards stamping out invalid localparts.