-
Notifications
You must be signed in to change notification settings - Fork 397
MSC4258: Federated User Directory #4258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
224fe5c
to
f4ee6bc
Compare
Co-authored-by: Maghen Calinghee <[email protected]> Co-authored-by: Olivier Delcroix <[email protected]> Co-authored-by: Yoan Pintas <[email protected]> Co-authored-by: Nicolas Buquet <[email protected]>
We are planning to implement the MSC in the coming months, however we would happily take early feedback in the meantime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
- Client
- Server
```json | ||
{ | ||
"limit": 10, | ||
"search_term": "foo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there guidance on how the search term is used? Is it the same as the current API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current spec is telling to search the Matrix ID and the display name and for now this proposal doesn't change that.
Should we change that to also search all profile fields instead ? I am not sure, opinions welcome here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the user id search term contains (part of) a domain name, we can probably stop asking random servers? Such as search term "@Steve:matri" probably does not need to query the etke.cc server for users? I am just a little worried about the performance impact of these searches. (although most search will probably NOT contain domain parts. Not sure really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be left as an implementation detail: we added or a subset of it
at L47 so the homeserver can be quite liberal in the caching and requesting behavior.
We first propose a new federation endpoint similar to the [current client API](https://spec.matrix.org/v1.12/client-server-api#post_matrixclientv3user_directorysearch). | ||
It would be authenticated and rate limited. | ||
|
||
#### `POST /_matrix/federation/v3/user_directory/search` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the valid error conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging on to this, the profile federation API has a 403 to let server admins deny profile look-up. This might be good to have on the user directory API as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see several cases:
- an empty list should be returned if all matching users has visibility settings that would hide them.
- federated user directory is fully disabled on the server. 404 could be used here.
- federated user directory is restricted to a set of allowed servers for example. We should probably use 403 then.
|
||
All profile fields (cf [MSC4133](https://github.com/matrix-org/matrix-spec-proposals/pull/4133)) should be returned here. | ||
|
||
When an user calls the client user search API, the server should send a federated user search request to all known servers. It would then receive the results and return them to the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds really really expensive for the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from #4259 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed MSC #4259 could/should be mentioned as a possibility to build upon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds really really expensive for the server.
We should probably relax the requirement to include all known servers, so that implementations can optimize that if needed. to all known servers or a subset of it
for example ?
I am not sure the impact will be that heavy however : typing in a room with almost all servers like Matrix HQ
will similarly broadcast events to all of those.
This could benefit from #4259 🙂
Indeed MSC #4259 could/should be mentioned as a possibility to build upon?
Could you elaborate what you both have in mind ? for now it can help a bit to receive profile updates if we have a local search cache, but I don't really see something else.
- `restricted`: visible to any user sharing a room with | ||
- `remote` (or federated or public ?): visible to users on local and remote homeservers | ||
|
||
If no value is provided (or it is null), the user hasn't set a preference and the server should follow the current expected behavior (visible if sharing a room in common or in public room). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"visible if sharing a room in common or in public room" is actually only the minimum requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal would change that, and user defined visibility would prevail, as expected by the users :)
- introduce `search_scope` in the client API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it is such a hard problem (mainly performance and privacy). There is a reason why there is no federated email directory....
A few minor comments inline.
@@ -0,0 +1,162 @@ | |||
# MSC4258: Federated User Directory | |||
|
|||
Currently user search can only be done locally, which would at best get a list of all users known to the server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I know what is meant, but the grammar of the second half of that sentence seems weird or at least difficult to parse.
|
||
The federation search endpoint should be rate limited. | ||
|
||
We recommend to not answer for `search_term` with less than 3 characters like "a" or "at". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so @A:matrix.org could never be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation could just return the exact match but not the rest. I'll add it.
```json | ||
{ | ||
"limit": 10, | ||
"search_term": "foo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the user id search term contains (part of) a domain name, we can probably stop asking random servers? Such as search term "@Steve:matri" probably does not need to query the etke.cc server for users? I am just a little worried about the performance impact of these searches. (although most search will probably NOT contain domain parts. Not sure really.
@@ -0,0 +1,162 @@ | |||
# MSC4258: Federated User Directory | |||
|
|||
Currently user search can only be done locally, which would at best get a list of all users known to the server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently user search can only be done locally, which would at best get a list of all users known to the server. | |
Currently user search can only be done locally, which lists - at maximum - all users known to the user's server. |
"display_name": "Foo", | ||
"m.tz": "America/New_York", | ||
"user_id": "@foo:bar.com", | ||
"visibility": "local", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may have discussed this in another thread already but since I cannot find it: I don't think this should be exposed in the response regardless of whether it's a local or a remote query. It is a user configuration and I cannot think of a reason why the requester would need to know whether they found the user because their visibility setting was local
, restricted
or remote
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only in the federation response, we removed it from the client one after switching to account data. It is useful to be able to apply restricted
visibility on the requester server.
It could be omitted if we pass the requester in the request, then the remote server is able to calculate if it should return the result or not.
We were more thinking about caching when designing the API, since having the visibility allows to cache the request for all users while we can only cache the result per user if we don't have the visibility.
Perhaps we should just return "visibility": "restricted"
and nothing otherwise as a tradeoff, I am not sure. But yeah current state is too leaky I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Sorry, I had mistaken this as the CS response. 🤦♂️
In that case this seems acceptable to me since the field is required for filtering by the server and being stripped out in the CS response.
|
||
#### New account data to control user visibility in the directory | ||
|
||
We propose to add a new account data of type `m.user_directory` with a single `visibility` field to give the user the ability to control their visibility in the user directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For rooms, their directory visibility is integrated into /createRoom
. Would it make sense to do something similar for users and integrate their initial visibility choice with user registration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion, let's wait for others opinion on that I think.
- `hidden` : not visible to anyone | ||
- `local` : visible only to local homeserver users | ||
- `restricted`: visible to any user sharing a room with | ||
- `remote`: visible to users on local and remote homeservers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For rooms, the visibility values are private
and public
. This could be mimicked here by using private
, local
, restricted
and public
. The terms "private" and "public" have been deemed ambiguous in the past. At the same time, however, maybe we're making things worse by adding yet another terminology. I don't have a strong opinion myself here to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a small preference to avoid the ambiguity, but both are fine for me.
- `restricted`: visible to any user sharing a room with | ||
- `remote`: visible to users on local and remote homeservers | ||
|
||
If no value is provided (or it is null), the user hasn't set a preference and the server should follow the current expected behavior (visible if sharing a room in common or in public room). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow up on #4258 (comment), what I meant is something like this:
If no value is provided (or it is null), the user hasn't set a preference and the server should follow the current expected behavior (visible if sharing a room in common or in public room). | |
If no value is provided (or it is null), the user hasn't set a preference and the server should follow the current expected behavior (MUST be visible if sharing a room in common or in public room, MAY still be visible in all other cases if the server chooses so). |
We also propose a new `search_scope` parameter to limit the scope of a search. | ||
Possible values are: | ||
- `local` : only search users local to the homeserver, this must not trigger a federated search | ||
- `restricted`: search users known to this homeserver, this must not trigger a federated search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the server still have to trigger federated profile queries for external users in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure honestly. I think it may be left as an implementation detail for now at least, and if it turns out that one way or the other is quite superior during implementation we can specify it.
Rendered
This proposal has been thought and written by me and people listed below, all employed by the French state for the Tchap project.
@mcalinghee @odelcroi @yostyle @NicolasBuquet