Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/little-balloons-sort.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@tus/server": minor
---

add Content-Type and Content-Disposition headers on GetHandler.send response
106 changes: 104 additions & 2 deletions packages/server/src/handlers/GetHandler.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,57 @@
import stream from 'node:stream'

import {BaseHandler} from './BaseHandler'
import {ERRORS} from '@tus/utils'
import {ERRORS, Upload} from '@tus/utils'

import type http from 'node:http'
import type {RouteHandler} from '../types'

export class GetHandler extends BaseHandler {
paths: Map<string, RouteHandler> = new Map()

/**
* reMimeType is a RegExp for check mime-type form compliance with RFC1341
* for support mime-type and extra parameters, for example:
*
* ```
* text/plain; charset=utf-8
* ```
*
* See: https://datatracker.ietf.org/doc/html/rfc1341 (Page 6)
*/
reMimeType =
/^(?:application|audio|example|font|haptics|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))\/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ ]*;[ ]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|"(?:[^"\\]|\.)*"))*)$/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it to support parameters as well? We now have a very complex, slow regex with lots of backtracking. If it doesn't add much value I think I prefer to stay with the simple version? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the caution, this last regex is definitely much more intimidating than the previous one, but it is not significantly slower, less than 2ms on average, and this last one analyzing the list of 2138 mime types registered by the IANA.

And answering your question, I think it is important, especially for text files, to preserve and transmit the information of the character set used, and this can be rendered properly when displayed inline.

If you like, we can change to an intermediate version that does not try to match on the parameters, and only looks at the mime type part, after that "anything can come".

Benchmarks

Simple Regex

Simple Regex

Allow Parameters and RFC compliance Regex

Allow Parameters Regex

Non Parameters match Regex

Intermediate Regex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters in Content-Type are also relevant for some media files. Although I am not sure if necessary, they can contain information about the used codecs in audio/video files (see tus/tusd#1194).

Tusd currently does not support parameters, but I plan an changing this. Its implementation currently also uses a regular expression for checking the media type's validity, but my preferred solution is to replace it with Go's builtin media type parser. I'm not sure if such a method is easily available to you, but maybe it provides another perspective on this topic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesusgoku wow thanks for taking the time to actually test it. Sounds good to me.


/**
* mimeInlineBrowserWhitelist is a set containing MIME types which should be
* allowed to be rendered by browser inline, instead of being forced to be
* downloaded. For example, HTML or SVG files are not allowed, since they may
* contain malicious JavaScript. In a similar fashion PDF is not on this list
* as their parsers commonly contain vulnerabilities which can be exploited.
*/
mimeInlineBrowserWhitelist = new Set([
'text/plain',

'image/png',
'image/jpeg',
'image/gif',
'image/bmp',
'image/webp',

'audio/wave',
'audio/wav',
'audio/x-wav',
'audio/x-pn-wav',
'audio/webm',
'audio/ogg',

'video/mp4',
'video/webm',
'video/ogg',

'application/ogg',
])

registerPath(path: string, handler: RouteHandler): void {
this.paths.set(path, handler)
}
Expand Down Expand Up @@ -45,12 +88,71 @@ export class GetHandler extends BaseHandler {
throw ERRORS.FILE_NOT_FOUND
}

const {contentType, contentDisposition} = this.filterContentType(stats)

// @ts-expect-error exists if supported
const file_stream = await this.store.read(id)
const headers = {'Content-Length': stats.offset}
const headers = {
'Content-Length': stats.offset,
'Content-Type': contentType,
'Content-Disposition': contentDisposition,
}
res.writeHead(200, headers)
return stream.pipeline(file_stream, res, () => {
// We have no need to handle streaming errors
})
}

/**
* filterContentType returns the values for the Content-Type and
* Content-Disposition headers for a given upload. These values should be used
* in responses for GET requests to ensure that only non-malicious file types
* are shown directly in the browser. It will extract the file name and type
* from the "filename" and "filetype".
* See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
*/
filterContentType(stats: Upload): {
contentType: string
contentDisposition: string
} {
let contentType: string
let contentDisposition: string

const {filetype, filename} = stats.metadata ?? {}

if (filetype && this.reMimeType.test(filetype)) {
// If the filetype from metadata is well formed, we forward use this
// for the Content-Type header. However, only whitelisted mime types
// will be allowed to be shown inline in the browser
contentType = filetype

if (this.mimeInlineBrowserWhitelist.has(filetype)) {
contentDisposition = 'inline'
} else {
contentDisposition = 'attachment'
}
} else {
// If the filetype from the metadata is not well formed, we use a
// default type and force the browser to download the content
contentType = 'application/octet-stream'
contentDisposition = 'attachment'
}

// Add a filename to Content-Disposition if one is available in the metadata
if (filename) {
contentDisposition += `; filename=${this.quote(filename)}`
}

return {
contentType,
contentDisposition,
}
}

/**
* Convert string to quoted string literals
*/
quote(value: string) {
return `"${value.replace(/"/g, '\\"')}"`
}
}
97 changes: 97 additions & 0 deletions packages/server/test/GetHandler.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,108 @@ describe('GetHandler', () => {
assert.equal(res.statusCode, 200)
// TODO: this is the get handler but Content-Length is only send in 204 OPTIONS requests?
// assert.equal(res.getHeader('Content-Length'), size)

assert.equal(res.getHeader('Content-Type'), 'application/octet-stream')
assert.equal(res.getHeader('Content-Disposition'), 'attachment')

assert.equal(store.getUpload.calledOnceWith(fileId), true)
assert.equal(store.read.calledOnceWith(fileId), true)
})
})

describe('filterContentType', () => {
it('should return default headers value without metadata', () => {
const fakeStore = sinon.stub(new DataStore())
const handler = new GetHandler(fakeStore, serverOptions)
const size = 512
const upload = new Upload({id: '1234', offset: size, size})

const res = handler.filterContentType(upload)

assert.deepEqual(res, {
contentType: 'application/octet-stream',
contentDisposition: 'attachment',
})
})

it('should return headers allow render in browser when filetype is in whitelist', () => {
const fakeStore = sinon.stub(new DataStore())
const handler = new GetHandler(fakeStore, serverOptions)
const size = 512
const upload = new Upload({
id: '1234',
offset: size,
size,
metadata: {filetype: 'image/png', filename: 'pet.png'},
})

const res = handler.filterContentType(upload)

assert.deepEqual(res, {
contentType: 'image/png',
contentDisposition: 'inline; filename="pet.png"',
})
})

it('should return headers force download when filetype is not in whitelist', () => {
const fakeStore = sinon.stub(new DataStore())
const handler = new GetHandler(fakeStore, serverOptions)
const size = 512
const upload = new Upload({
id: '1234',
offset: size,
size,
metadata: {filetype: 'application/zip', filename: 'pets.zip'},
})

const res = handler.filterContentType(upload)

assert.deepEqual(res, {
contentType: 'application/zip',
contentDisposition: 'attachment; filename="pets.zip"',
})
})

it('should return headers when filetype is not a valid form', () => {
const fakeStore = sinon.stub(new DataStore())
const handler = new GetHandler(fakeStore, serverOptions)
const size = 512
const upload = new Upload({
id: '1234',
offset: size,
size,
metadata: {filetype: 'image_png', filename: 'pet.png'},
})

const res = handler.filterContentType(upload)

assert.deepEqual(res, {
contentType: 'application/octet-stream',
contentDisposition: 'attachment; filename="pet.png"',
})
})
})

describe('quote', () => {
it('should return simple quoted string', () => {
const fakeStore = sinon.stub(new DataStore())
const handler = new GetHandler(fakeStore, serverOptions)

const res = handler.quote('pet.png')

assert.equal(res, '"pet.png"')
})

it('should return quoted string when include quotes', () => {
const fakeStore = sinon.stub(new DataStore())
const handler = new GetHandler(fakeStore, serverOptions)

const res = handler.quote('"pet.png"')

assert.equal(res, '"\\"pet.png\\""')
})
})

describe('registerPath()', () => {
it('should call registered path handler', async () => {
const fakeStore = sinon.stub(new DataStore())
Expand Down
Loading