feat: support unicode in object names by ferhatelmas · Pull Request #875 · supabase/storage

ferhatelmas · 2026-02-26T11:55:20Z

What kind of change does this PR introduce?

feature

What is the current behavior?

Limited set of characters are permitted in object names.

What is the new behavior?

Unicode is supported across the board.

Additional context

#133

coderabbitai · 2026-02-26T11:55:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e5c96d57-5ea8-43b2-9cf5-52bf67b8e633

📥 Commits

Reviewing files that changed from the base of the PR and between cadda24 and c473769.

📒 Files selected for processing (3)

src/storage/object.ts
src/storage/protocols/s3/s3-handler.ts
src/test/s3-protocol.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/storage/protocols/s3/s3-handler.ts

📝 Walkthrough

Summary by CodeRabbit

New Features
- Wider Unicode support for object names, optional source-version copy, XML numeric-entity decoding, webhook disable filtering, and safer S3 CopySource encoding. Signing/URL routines tightened.
Bug Fixes
- Clearer "invalid object name" errors, rejection of disallowed Unicode/non-characters, fixes to continuation-token and signing edge cases (including double-encoded paths).
Tests
- Large expansion of Unicode, signed-URL, copy/multipart, webhook-filtering, TUS and test-hygiene coverage.

Walkthrough

Adds a tenant SQL migration enforcing object-name constraints; decodes XML numeric entities in the XML parser; tightens key validation to reject XML-invalid and Unicode non-characters; centralizes S3 CopySource encoding/parsing and adds validation for versionId; introduces versioned continuation tokens and robust decoding; changes signed URL signing APIs and adds request-path matching utilities; adjusts DB error messaging for the new constraint; and expands tests across Unicode keys, CopySource, signed URLs, continuation tokens, and related flows.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP as S3 Handler
    participant Parser as parseCopySource
    participant Storage as Storage Service
    participant DB as Database
    participant Backend as Object Backend

    Client->>HTTP: COPY request (CopySource header, Destination)
    HTTP->>Parser: parseCopySource(CopySource)
    alt malformed
        Parser-->>HTTP: throw InvalidParameter / MissingParameter
        HTTP-->>Client: 400
    else parsed
        Parser-->>HTTP: {bucket, key, sourceVersion?}
        HTTP->>Storage: copyObject(srcBucket, srcKey, srcVersion?, destBucket, destKey)
        Storage->>DB: SELECT version WHERE bucket=srcBucket AND name=srcKey
        DB-->>Storage: {version, metadata} or not found
        alt sourceVersion provided
            Storage->>Storage: compare provided sourceVersion vs DB.version
            alt mismatch
                Storage-->>HTTP: throw NoSuchKey / InvalidParameter
                HTTP-->>Client: 404/400
            else match
                Storage->>Backend: CopyObject with encoded CopySource (includes versionId)
                Backend-->>Storage: {eTag, lastModified}
                Storage-->>HTTP: success
                HTTP-->>Client: 200 OK
            end
        else no sourceVersion
            Storage->>Backend: CopyObject using DB.version in CopySource
            Backend-->>Storage: {eTag, lastModified}
            Storage-->>HTTP: success
            HTTP-->>Client: 200 OK
        end
    end

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/storage/object.ts (1)

876-894: ⚠️ Potential issue | 🟠 Major

Backward compatibility break in continuation token decoding corrupts pagination cursors

Line 889 unconditionally decodes all token values with decodeURIComponent(). Old tokens stored raw unescaped values (e.g., a%2Fb as a literal string), but the new decoder treats %2F as an encoded /, silently transforming cursor positions and causing list operations to skip or duplicate results.

Proposed fix (version-prefix encoded tokens)

 function encodeContinuationToken(tokenInfo: ContinuationToken) {
   const result: string[] = []
   for (const [k, v] of Object.entries(CONTINUATION_TOKEN_PART_MAP)) {
     const value = tokenInfo[v]
     if (value) {
-      result.push(`${k}:${encodeURIComponent(value)}`)
+      result.push(`${k}:e:${encodeURIComponent(value)}`)
     }
   }
   return Buffer.from(result.join('\n')).toString('base64')
 }

 function decodeContinuationToken(token: string): ContinuationToken {
@@
-    let value = partMatch[2]
-    try {
-      value = decodeURIComponent(value)
-    } catch {
-      // Backward compatibility: previously cursor values were stored unescaped.
-    }
+    let value = partMatch[2]
+    if (value.startsWith('e:')) {
+      value = decodeURIComponent(value.slice(2))
+    }
     result[CONTINUATION_TOKEN_PART_MAP[partMatch[1]]] = value
   }
   return result
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/storage/object.ts` around lines 876 - 894, The decoder
decodeContinuationToken currently always runs decodeURIComponent on every part
which corrupts legacy unescaped tokens; update decodeContinuationToken to
support a version prefix: detect and strip a token version prefix (e.g., "v1:"
vs legacy/no-prefix), and only apply decodeURIComponent for the new version (v1)
while preserving legacy behavior by leaving values raw (no decoding) when the
token has no version prefix; keep using CONTINUATION_TOKEN_PART_MAP and the same
part parsing logic, and ensure the function throws the same Invalid continuation
token error for malformed parts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/internal/errors/codes.ts`:
- Line 327: The error message in ERRORS.InvalidKey uses encodeURIComponent(key)
which can throw for unpaired UTF-16 surrogates; update ERRORS.InvalidKey (where
the message is constructed) to produce a safe-encoded key: prefer using
key.toWellFormed() if available, otherwise normalize/replace unpaired surrogates
(or fall back to a try/catch around encodeURIComponent and sanitize the key to
replace invalid code units with U+FFFD) and then call encodeURIComponent on the
sanitized string before embedding it in the message so URIError cannot escape
and turn a 400 into a 500.

---

Outside diff comments:
In `@src/storage/object.ts`:
- Around line 876-894: The decoder decodeContinuationToken currently always runs
decodeURIComponent on every part which corrupts legacy unescaped tokens; update
decodeContinuationToken to support a version prefix: detect and strip a token
version prefix (e.g., "v1:" vs legacy/no-prefix), and only apply
decodeURIComponent for the new version (v1) while preserving legacy behavior by
leaving values raw (no decoding) when the token has no version prefix; keep
using CONTINUATION_TOKEN_PART_MAP and the same part parsing logic, and ensure
the function throws the same Invalid continuation token error for malformed
parts.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 46444ed6-2080-41d2-80b4-c88c86a791d1

📥 Commits

Reviewing files that changed from the base of the PR and between d8dba53 and cde444c.

📒 Files selected for processing (21)

migrations/tenant/57-unicode-object-names.sql
package.json
src/http/plugins/xml.ts
src/internal/database/migrations/types.ts
src/internal/errors/codes.ts
src/scripts/migrations-types.ts
src/storage/backend/s3/adapter.ts
src/storage/database/knex.ts
src/storage/limits.ts
src/storage/object.ts
src/storage/protocols/s3/s3-handler.ts
src/test/bucket.test.ts
src/test/common.ts
src/test/limits.test.ts
src/test/object-list-v2.test.ts
src/test/object.test.ts
src/test/s3-adapter.test.ts
src/test/s3-protocol.test.ts
src/test/test-hygiene.test.ts
src/test/tus.test.ts
src/test/xml-plugin.test.ts

src/internal/errors/codes.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/test/xml-plugin.test.ts`:
- Around line 3-15: Add tests to ensure decodeXmlNumericEntities rejects
XML-forbidden but in-range numeric entities (so they remain un-decoded). In the
existing test suite for decodeXmlNumericEntities add negative cases asserting
inputs like '&#0;' (and '&#000;' hex/dec variants), surrogate halves such as
'&#xD800;' (and decimal '&#55296;'), and noncharacters like '&#xFFFF;' (and
decimal '&#65535;') are returned unchanged; reference the
decodeXmlNumericEntities function to locate the implementation to validate
behavior and ensure both hex (&#x...) and decimal (&#...;) forms are covered.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: af25095c-594e-4a8f-afaa-599e92b3eee5

📥 Commits

Reviewing files that changed from the base of the PR and between cde444c and fc16aa6.

📒 Files selected for processing (21)

migrations/tenant/57-unicode-object-names.sql
package.json
src/http/plugins/xml.ts
src/internal/database/migrations/types.ts
src/internal/errors/codes.ts
src/scripts/migrations-types.ts
src/storage/backend/s3/adapter.ts
src/storage/database/knex.ts
src/storage/limits.ts
src/storage/object.ts
src/storage/protocols/s3/s3-handler.ts
src/test/bucket.test.ts
src/test/common.ts
src/test/limits.test.ts
src/test/object-list-v2.test.ts
src/test/object.test.ts
src/test/s3-adapter.test.ts
src/test/s3-protocol.test.ts
src/test/test-hygiene.test.ts
src/test/tus.test.ts
src/test/xml-plugin.test.ts

🚧 Files skipped from review as they are similar to previous changes (9)

src/test/test-hygiene.test.ts
src/test/object-list-v2.test.ts
src/test/common.ts
migrations/tenant/57-unicode-object-names.sql
src/storage/object.ts
src/storage/backend/s3/adapter.ts
src/internal/errors/codes.ts
src/http/plugins/xml.ts
src/test/limits.test.ts

src/test/xml-plugin.test.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/storage/protocols/s3/s3-handler.ts`:
- Around line 1448-1501: Change decodeLegacyContinuationToken and
decodeContinuationToken to accept a parameter name (e.g.
decodeContinuationToken(token: string, parameterName: string)) and replace all
plain throws like throw new Error('Invalid continuation token') with the
standardized request error from your ERRORS collection, passing the parameter
name (e.g. throw ERRORS.InvalidArgument(parameterName) or the appropriate
ERRORS.* factory used elsewhere). Update callers (e.g. where
KeyMarker/UploadIdMarker are decoded) to pass the parameter name string such as
'KeyMarker' or 'UploadIdMarker' when calling
decodeContinuationToken/decodeLegacyContinuationToken so malformed client tokens
produce a 4xx request error instead of a generic exception.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 773df860-6e8f-4ac9-bba0-96890cba39d8

📥 Commits

Reviewing files that changed from the base of the PR and between 0121419 and ff9670e.

📒 Files selected for processing (20)

migrations/tenant/57-unicode-object-names.sql
src/http/routes/object/getSignedObject.ts
src/http/routes/object/getSignedUploadURL.ts
src/http/routes/render/renderSignedImage.ts
src/http/routes/signed-url.ts
src/storage/backend/s3/adapter.ts
src/storage/events/lifecycle/webhook-filter.ts
src/storage/events/lifecycle/webhook.ts
src/storage/object.ts
src/storage/protocols/s3/s3-handler.ts
src/test/limits.test.ts
src/test/object-list-v2.test.ts
src/test/object.test.ts
src/test/render-routes.test.ts
src/test/s3-adapter.test.ts
src/test/s3-protocol.test.ts
src/test/signed-url-route.test.ts
src/test/tus.test.ts
src/test/webhook-filter.test.ts
src/test/webhooks.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

src/storage/backend/s3/adapter.ts
src/test/limits.test.ts

src/storage/protocols/s3/s3-handler.ts

This allows all Unicode characters that are also valid in XML 1.0 documents. See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html See: https://www.w3.org/TR/REC-xml/#charsets

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

ferhatelmas mentioned this pull request Feb 26, 2026

fix: allow Unicode object names #605

Closed

ferhatelmas force-pushed the ferhat/name branch 7 times, most recently from 3dd9e78 to 60b9538 Compare March 5, 2026 10:20

ferhatelmas marked this pull request as ready for review March 5, 2026 13:15

ferhatelmas requested a review from a team as a code owner March 5, 2026 13:15

ferhatelmas changed the title ~~feat: support utf8 in object names~~ feat: support unicode in object names Mar 5, 2026

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

src/internal/errors/codes.ts Outdated Show resolved Hide resolved

ferhatelmas force-pushed the ferhat/name branch from cde444c to fc16aa6 Compare March 5, 2026 15:43

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

src/test/xml-plugin.test.ts Show resolved Hide resolved

ferhatelmas force-pushed the ferhat/name branch 3 times, most recently from ff9670e to bf8efff Compare March 6, 2026 09:41

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

src/storage/protocols/s3/s3-handler.ts Show resolved Hide resolved

mlatief and others added 11 commits March 6, 2026 14:42

fix: allow Unicode object names

cf19b95

This allows all Unicode characters that are also valid in XML 1.0 documents. See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html See: https://www.w3.org/TR/REC-xml/#charsets

test: add unicode key and xml entity edge-case coverage

d3297cb

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: more tests and edge case handling

7d836aa

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: make migration idempotent

6077783

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: more tests for gaps

403e49e

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: oriole compat

b33ef6e

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: lint

0d62cd7

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: add delete many test

2682b8f

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: decode logic in xml

287e32d

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: make delete many example more complex

79b1b38

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: more tests for gaps

e5fc7a4

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

ferhatelmas added 13 commits March 6, 2026 14:48

fix: invalid key surrogates

0b21198

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: sign and more coverage

cfb6f99

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: more batch sign coverage

ffd0bdc

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: cleanup data after run

0e59725

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: add tests for webhooks

003a454

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: drop dead param for sign

d67e442

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: s3 copy source

1dcb1fd

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: backward compat for list continuation

43c4f21

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: backward compat for s3 continuation

f9727c0

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: control chars in migration

8b104c8

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: explicit decoding

38bdee2

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: more test coverage

afd6f80

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: post rebase

e9d8095

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

ferhatelmas force-pushed the ferhat/name branch from bf8efff to e9d8095 Compare March 6, 2026 13:53

ferhatelmas added 2 commits March 6, 2026 15:09

fix: test data dependence

cadda24

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

fix: error shape

c473769

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support unicode in object names#875

feat: support unicode in object names#875
ferhatelmas wants to merge 26 commits intomasterfrom
ferhat/name

ferhatelmas commented Feb 26, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Reviews paused

Summary by CodeRabbit

Walkthrough

Sequence Diagram(s)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ferhatelmas commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Walkthrough

Sequence Diagram(s)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ferhatelmas commented Feb 26, 2026 •

edited

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading