-
Notifications
You must be signed in to change notification settings - Fork 399
MSC2246: Asynchronous media uploads #2246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
b439277
a83c79c
9a395ed
29e3463
7cf22be
658aac8
bbd7d08
0bffcb7
4d009a9
1cbc04e
c65f2bf
63cef50
8ccf85f
12e907b
d582bb3
173edf3
f438754
725675c
d55f1f9
955177b
823fcca
3b00026
9627af2
045c21e
011031b
6cb7e31
fedc697
7652f59
098dd90
9559ab0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,101 @@ | ||||||
# Asynchronous media uploads | ||||||
Sending media to Matrix currently requires that clients first upload the media | ||||||
to the content repository and then send the event. This is a problem for some | ||||||
use cases, such as bridges that want to preserve message order, as reuploading | ||||||
a large file would block all messages. | ||||||
|
||||||
## Proposal | ||||||
This proposal proposes a way to send the event containing media before actually | ||||||
uploading the media, which would make the aformentioned bridge message order | ||||||
preservation possible without blocking all other messages behind a long upload. | ||||||
|
||||||
In the future, this new functionality could be used for streaming file | ||||||
transfers, as requested in [matrix-spec#432]. | ||||||
|
||||||
### Content repository behavior | ||||||
The proposal adds two new endpoints to the content repository API and modifies | ||||||
the download endpoint. | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
#### `POST /_matrix/media/v3/create` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This endpoint, but not the newly-proposed
Suggested change
( |
||||||
Create a new MXC URI without content. Like `/upload`, this endpoint requires | ||||||
auth and returns the `content_uri` that can be used in events. | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The request body should be an empty JSON object. In the future, the body could | ||||||
be used for metadata about the file, such as the mime type or access control | ||||||
settings (related: [MSC701]). | ||||||
|
||||||
To prevent spam, servers should limit calls to this endpoint if the user has | ||||||
created many media IDs without uploading any content to them. The error code | ||||||
for such a spam limit is `M_TOO_MANY_IDS_ASSIGNED`. Servers should also provide | ||||||
configuration options to let high-traffic clients like application services | ||||||
bypass these limits. The `rate_limited` flag in the appservice registration is | ||||||
one potential way to do this. | ||||||
|
||||||
The server may optionally enforce a maximum age for unused media IDs to delete | ||||||
media IDs when the client doesn't start the upload in time, or when the upload | ||||||
was interrupted and not resumed in time. The server should include the maximum | ||||||
timestamp to start the upload in the `unused_expires_at` field in the response | ||||||
JSON. The recommended default expiration for starting the upload is 1 minute. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit unsure about this. The resources taken up by an unused media ID is minimal (just a row in the DB), whereas failing to upload media in time means that the associated event forever links to broken data. The scenario I'm thinking of is dodgy/slow internet connection, where it wouldn't be unreasonable to struggle to upload the media in time. We could increase the recommended default expiry, but I'm not entirely sure what the expiry really buys in practice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The expiry is to prevent clients from retrying/waiting forever for files that are never uploaded. Increasing the suggested time may be reasonable if the upload fails for any reason as it would prevent retrying if you're uploading a large file making that media ID stale. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been thinking about this a bit, and I agree it would be good to support both requests for the media a) initially blocking until the media starts getting uploaded, and b) after a while returning an error code immediately if the media hasn't been uploaded yet. I guess the questions are:
Allowing late uploads means that every time a client comes to display the media it will retry, whereas if we return a 404 it could potentially cache that the media will never be uploaded. My hunch though is that clients won't ever bother caching error responses, so there's not much need to disallow late uploads? I don't think this is a huge issue either way ftr. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Web clients will cache the error response implicitly, particularly with aggressive cache-control headers, fwiw. A possible attack angle here is that a malicious user/client could reserve a ton of event IDs (hundreds per minute with current rate limits and how cheap this request is to call) and send those into a room. The clients in that room then valiantly try to download that media, exhausting the available connections on the media server while it sits there blocking for media. This is somewhat of a risk if the user were to upload a ton of media to a room today, however servers generally cache or keep the recently uploaded media close so can respond to an influx of download requests quickly, reducing the attack surface by enough to not be as much of a concern. To resolve this, maybe we don't use rate limits as our primary defence. We should still rate limit to avoid clients sending a billion requests per second, but we could additionally ask that servers employ an internal (undocumented/not-revealed) quota for the number of incomplete uploads a user can have. When the user exceeds that quota, the oldest incomplete upload(s) are expired such that they return immediate 404s to If the server uses a small enough quota (say, 10?), then at worst the user can cause fewer requests which contribute to the server's maximum open file limit, similar to if the client uploaded large files which force the connection to be held open for longer. This sort of quota system also allows clients to take an eternity to download and re-upload their files, which I think should be a feature of an async upload. A use case would be a non-messenger client which knows that an MXC URI is needed but the file upload is reliant on user input: it could take a few minutes for the user to find the file on their harddrive or they might realize that it's on a whole other machine and finish the upload the next day. For clarity: we wouldn't have an expiration timestamp with the above quota system. Also: I don't think quota is the word I'm after, but it's the word I'm using. (I am concerned about this enough to block FCP, ftr) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An invisible quota system might be more a problem than a solution, as if a client expects to be able to upload 50 concurrent files, while the server internally only allows 10, then it'll result in unstable/unusable UX. (E.g. those first 40 files erroring out on others' screens, and/or erroring on the uploader's side, depending how this is implemented) If there's a quota, the server should expose it, so clients can schedule uploads accordingly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's quite a bit different, yes. With this MSC the attacker can exhaust a server's resources trivially, per my comment above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we go with the quota idea, it seems like we would want to have an expiration timeout still, in case a client crashes and never finishes the upload. We don't want a dead upload to permanently cause the user's quota to be lowered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we make the download/thumbnail endpoints return immediately when if the media isn't uploaded yet, but give some metadata? For example, This will allow clients to show something useful in these cases as well. I think this solution would prevent malicious actors from causing servers to keep open tons of connections as they can respond immediately, and close the connection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also a new potential usecase on the horizon where we'd want extremely long-lived media uploads (could take literal hours to finish uploading the media), so would be good to consider what a timeout-less system looks like I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that these concerns have been addressed in the latest version of the MSC. The main mechanisms are:
I think that exploring a timeout-less system should be done in a separate MSC. |
||||||
|
||||||
##### Example response | ||||||
```json | ||||||
{ | ||||||
"content_uri": "mxc://example.com/AQwafuaFswefuhsfAFAgsw", | ||||||
"unused_expires_at": 1647257217083 | ||||||
} | ||||||
``` | ||||||
|
||||||
#### `PUT /_matrix/media/v3/upload/{serverName}/{mediaId}` | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the intention that this endpoint support the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. In the future those details could be moved to the /create call request body, but that wasn't defined here |
||||||
Upload content to a MXC URI that was created earlier. If the endpoint is called | ||||||
with a media ID that already has content, the request should be rejected with | ||||||
the error code `M_CANNOT_OVERWRITE_MEDIA` and HTTP status code 409. The endpoint | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
should also reject upload requests from users other than the user who created | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
the media ID. This endpoint requires auth. | ||||||
|
||||||
If the upload is successful, an empty JSON object and status code 200 is | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
returned. If the serverName/mediaId combination is not known or not local, an | ||||||
`M_NOT_FOUND` error is returned. For other errors, such as file size, file type | ||||||
or user quota errors, the normal `/upload` rules apply. | ||||||
|
||||||
#### Changes to the `/download` endpoint | ||||||
Two new query parameters are added: `max_stall_ms` and `allow_stream`. | ||||||
|
||||||
* `max_stall_ms` is an integer that specifies the maximum number of milliseconds | ||||||
that the client is willing to wait for the upload to start (or finish, when | ||||||
streaming is not enabled). The default value is 10000 (10 seconds). | ||||||
* `allow_stream` is a boolean that specifies whether or not the content | ||||||
repository should stream data as it comes in from the sender. Defaults to | ||||||
`true`. | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
When another client tries to download media that is currently being uploaded, | ||||||
the content repository should send already uploaded data and stream new data as | ||||||
it comes in from the sender. | ||||||
|
||||||
If the upload is not started after the time limit, the content repository | ||||||
should return a `M_NOT_YET_UPLOADED` error. The error may include an additional | ||||||
tulir marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
`retry_after_ms` field to suggest when the client should try again. | ||||||
|
||||||
If streaming is not supported by the server, or if it's disabled with the query | ||||||
parameter, the content repository should act as if the upload has not started | ||||||
until the upload is finished. | ||||||
|
||||||
## Tradeoffs | ||||||
|
||||||
## Potential issues | ||||||
Other clients may time out the download if the sender takes too long to upload | ||||||
media. | ||||||
|
||||||
## Security considerations | ||||||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Unstable prefix | ||||||
While this MSC is not in a released version of the spec, implementations should | ||||||
use `fi.mau.msc2246` as a prefix and as a `unstable_features` flag in the | ||||||
`/versions` endpoint. | ||||||
|
||||||
* `POST /_matrix/media/unstable/fi.mau.msc2246/create` | ||||||
* `PUT /_matrix/media/unstable/fi.mau.msc2246/upload/{serverName}/{mediaId}` | ||||||
* `?fi.mau.msc2246.max_stall_ms` | ||||||
* `?fi.mau.msc2246.allow_stream` | ||||||
|
||||||
[matrix-spec#432]: https://github.com/matrix-org/matrix-spec/issues/432 | ||||||
[MSC701]: https://github.com/matrix-org/matrix-doc/issues/701 |
Uh oh!
There was an error while loading. Please reload this page.