-
Notifications
You must be signed in to change notification settings - Fork 404
MSC4258: Federated User Directory #4258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,162 @@ | ||||||
# MSC4258: Federated User Directory | ||||||
|
||||||
Currently user search can only be done locally, which would at best get a list of all users known to the server. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: I know what is meant, but the grammar of the second half of that sentence seems weird or at least difficult to parse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
This proposal aims at introducing a federation endpoint allowing servers to broadcast the search to the current federating servers and get results among all their known users. | ||||||
|
||||||
Improvement of the client API is also proposed to accommodate the fact that results will arrive asynchronously and to allow users to manage their visibility on search results. | ||||||
|
||||||
|
||||||
## Proposal | ||||||
|
||||||
### Federation endpoint | ||||||
|
||||||
We first propose a new federation endpoint similar to the [current client API](https://spec.matrix.org/v1.12/client-server-api#post_matrixclientv3user_directorysearch). | ||||||
It would be authenticated and rate limited. | ||||||
|
||||||
#### `POST /_matrix/federation/v3/user_directory/search` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the valid error conditions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tagging on to this, the profile federation API has a 403 to let server admins deny profile look-up. This might be good to have on the user directory API as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see several cases:
|
||||||
|
||||||
#### Request | ||||||
```json | ||||||
{ | ||||||
"limit": 10, | ||||||
"search_term": "foo" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there guidance on how the search term is used? Is it the same as the current API? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Current spec is telling to search the Matrix ID and the display name and for now this proposal doesn't change that. Should we change that to also search all profile fields instead ? I am not sure, opinions welcome here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. once the user id search term contains (part of) a domain name, we can probably stop asking random servers? Such as search term "@Steve:matri" probably does not need to query the etke.cc server for users? I am just a little worried about the performance impact of these searches. (although most search will probably NOT contain domain parts. Not sure really. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be left as an implementation detail: we added |
||||||
} | ||||||
``` | ||||||
|
||||||
#### Response | ||||||
```json | ||||||
{ | ||||||
"limited": false, | ||||||
"results": [ | ||||||
{ | ||||||
"avatar_url": "mxc://bar.com/foo", | ||||||
"display_name": "Foo", | ||||||
"m.tz": "America/New_York", | ||||||
"user_id": "@foo:bar.com", | ||||||
"visibility": "local", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we may have discussed this in another thread already but since I cannot find it: I don't think this should be exposed in the response regardless of whether it's a local or a remote query. It is a user configuration and I cannot think of a reason why the requester would need to know whether they found the user because their visibility setting was There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is only in the federation response, we removed it from the client one after switching to account data. It is useful to be able to apply It could be omitted if we pass the requester in the request, then the remote server is able to calculate if it should return the result or not. We were more thinking about caching when designing the API, since having the visibility allows to cache the request for all users while we can only cache the result per user if we don't have the visibility. Perhaps we should just return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I see. Sorry, I had mistaken this as the CS response. 🤦♂️ In that case this seems acceptable to me since the field is required for filtering by the server and being stripped out in the CS response. |
||||||
} | ||||||
] | ||||||
} | ||||||
``` | ||||||
|
||||||
All profile fields (cf [MSC4133](https://github.com/matrix-org/matrix-spec-proposals/pull/4133)) should be returned here. | ||||||
|
||||||
On top of that, the server should include a `visibility` field that should mirror the user preference stored in the account data, cf `m.user_directory` definition later on. | ||||||
|
||||||
When an user calls the client user search API, the server should send a federated user search request to all known servers or a subset of it. It would then receive the results and return them to the user. | ||||||
Servers must not forward this request to other servers and only return results known locally. This is to avoid infinite loop between servers knowing each other. | ||||||
|
||||||
However we have a problem here: we can't have expectation on when and even if servers will answer to the search request. | ||||||
|
||||||
We are hence proposing some changes to the client API to accommodate the need to have a way to stream new results to the client. | ||||||
|
||||||
### Client endpoint changes | ||||||
|
||||||
#### New account data to control user visibility in the directory | ||||||
|
||||||
We propose to add a new account data of type `m.user_directory` with a single `visibility` field to give the user the ability to control their visibility in the user directory. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For rooms, their directory visibility is integrated into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong opinion, let's wait for others opinion on that I think. |
||||||
|
||||||
Different values are possible : | ||||||
- `hidden` : not visible to anyone | ||||||
- `local` : visible only to local homeserver users | ||||||
- `restricted`: visible to any user sharing a room with | ||||||
- `remote`: visible to users on local and remote homeservers | ||||||
Comment on lines
+61
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For rooms, the visibility values are There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I have a small preference to avoid the ambiguity, but both are fine for me. |
||||||
|
||||||
If no value is provided (or it is null), the user hasn't set a preference and the server should follow the current expected behavior (visible if sharing a room in common or in public room). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To follow up on #4258 (comment), what I meant is something like this:
Suggested change
|
||||||
|
||||||
Example of the content of a `m.user_directory` account data: | ||||||
```json | ||||||
{ | ||||||
"visibility": "local" | ||||||
} | ||||||
``` | ||||||
|
||||||
#### POST /_matrix/client/v3/user_directory/search | ||||||
|
||||||
#### Request | ||||||
```json | ||||||
{ | ||||||
"limit": 10, | ||||||
"search_term": "foo", | ||||||
"search_token": "a1d29g4f73", | ||||||
"search_scope": "remote" | ||||||
} | ||||||
``` | ||||||
|
||||||
We propose to introduce a reactive mechanism to allow the server to stream new results to the client. | ||||||
|
||||||
For that we introduce a `search_token` to the request coming from the response of a previous search(`search_token`). A request containing a `search_token` will stall until new results are available to the server. If some more results are expected to be returned, it may include another `search_token`, and hence. | ||||||
|
||||||
`search_token` is optional within the request so proposed changes are retro-compatible. | ||||||
|
||||||
We also propose a new `search_scope` parameter to limit the scope of a search. | ||||||
Possible values are: | ||||||
- `local` : only search users local to the homeserver, this must not trigger a federated search | ||||||
- `restricted`: search users known to this homeserver, this must not trigger a federated search | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would the server still have to trigger federated profile queries for external users in this case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure honestly. I think it may be left as an implementation detail for now at least, and if it turns out that one way or the other is quite superior during implementation we can specify it. |
||||||
- `remote`: on top of users known to this homeserver, it should include results coming from other homeservers via the newly introduced federated search endpoint. | ||||||
|
||||||
This parameter is optional and default to remote. | ||||||
|
||||||
#### Response | ||||||
```json | ||||||
{ | ||||||
"search_token": "a1d29g4f73", | ||||||
"limited": true, | ||||||
"results": [ | ||||||
{ | ||||||
"avatar_url": "mxc://bar.com/foo", | ||||||
"display_name": "Foo", | ||||||
"m.tz": "America/New_York", | ||||||
"user_id": "@foo:bar.com" | ||||||
} | ||||||
] | ||||||
} | ||||||
``` | ||||||
`search_token` : is a unique identifier that means that more results can be retrieved by querying with this `search_token`. `limited` should be `true` when `search_token` is returned. | ||||||
|
||||||
The server should strip `visibility` from the results returned by other servers. | ||||||
|
||||||
## Potential issues | ||||||
|
||||||
We may have requests lost or getting timeout from intermediary network equipment, especially since we are using some kind of long polling. | ||||||
We think the fact that we use a `search_token` that changes on each request allow the server to track correctly if new search results were already received by the client or not. | ||||||
|
||||||
## Alternatives | ||||||
|
||||||
We first thought about using an account data, however it has a big caveat: remote servers can't access it, hence remote servers will not be able to honor the visibility when trying to return remote users that are already visible locally to them. | ||||||
MatMaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Rather than using a `search_token`, we could use a `search_id` that will be the same for all subsequent calls. | ||||||
This solution is less informative about the progression of the search from the server perspective, cf `Potential issues` section. | ||||||
|
||||||
## Security considerations | ||||||
|
||||||
### Sensitive Data Exposure | ||||||
|
||||||
A malicious server could list all user matrix ids that are defined in `remote` or `restricted`. | ||||||
|
||||||
#### Data Exposure Mitigation recommendations | ||||||
|
||||||
The federation search endpoint should be rate limited. | ||||||
|
||||||
We recommend to not answer for `search_term` with less than 3 characters like "a" or "at". | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so @A:matrix.org could never be found? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implementation could just return the exact match but not the rest. I'll add it. |
||||||
|
||||||
#### Trust & Safety recommendations | ||||||
|
||||||
We recommend to log requests (or at least their count) from each server in order to be able to identify and ban the malicious servers who are trying to scrap all visible (including `restricted` ones) users profiles of the federation. | ||||||
|
||||||
Before, a server needed to join a room to list the users in a room (`restricted`). This scenario is logged in the room state. | ||||||
Now with this change, it is possible to list all the restricted users from other servers with no trace left at the protocol level. | ||||||
|
||||||
|
||||||
## Unstable prefix | ||||||
|
||||||
`fr.tchap.user_directory.visibility` should be used as an unstable identifier for the profile field. | ||||||
|
||||||
`/_matrix/federation/unstable/fr.tchap/user_directory/search` should be used as an unstable federation endpoint. | ||||||
|
||||||
|
||||||
## Dependencies | ||||||
|
||||||
This MSC references [MSC4133](https://github.com/matrix-org/matrix-spec-proposals/pull/4133) but does not depend on it. | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements: