Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a tenant SQL migration enforcing object-name constraints; decodes XML numeric entities in the XML parser; tightens key validation to reject XML-invalid and Unicode non-characters; centralizes S3 CopySource encoding/parsing and adds validation for versionId; introduces versioned continuation tokens and robust decoding; changes signed URL signing APIs and adds request-path matching utilities; adjusts DB error messaging for the new constraint; and expands tests across Unicode keys, CopySource, signed URLs, continuation tokens, and related flows. Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP as S3 Handler
participant Parser as parseCopySource
participant Storage as Storage Service
participant DB as Database
participant Backend as Object Backend
Client->>HTTP: COPY request (CopySource header, Destination)
HTTP->>Parser: parseCopySource(CopySource)
alt malformed
Parser-->>HTTP: throw InvalidParameter / MissingParameter
HTTP-->>Client: 400
else parsed
Parser-->>HTTP: {bucket, key, sourceVersion?}
HTTP->>Storage: copyObject(srcBucket, srcKey, srcVersion?, destBucket, destKey)
Storage->>DB: SELECT version WHERE bucket=srcBucket AND name=srcKey
DB-->>Storage: {version, metadata} or not found
alt sourceVersion provided
Storage->>Storage: compare provided sourceVersion vs DB.version
alt mismatch
Storage-->>HTTP: throw NoSuchKey / InvalidParameter
HTTP-->>Client: 404/400
else match
Storage->>Backend: CopyObject with encoded CopySource (includes versionId)
Backend-->>Storage: {eTag, lastModified}
Storage-->>HTTP: success
HTTP-->>Client: 200 OK
end
else no sourceVersion
Storage->>Backend: CopyObject using DB.version in CopySource
Backend-->>Storage: {eTag, lastModified}
Storage-->>HTTP: success
HTTP-->>Client: 200 OK
end
end
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3dd9e78 to
60b9538
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/object.ts (1)
876-894:⚠️ Potential issue | 🟠 MajorBackward compatibility break in continuation token decoding corrupts pagination cursors
Line 889 unconditionally decodes all token values with
decodeURIComponent(). Old tokens stored raw unescaped values (e.g.,a%2Fbas a literal string), but the new decoder treats%2Fas an encoded/, silently transforming cursor positions and causing list operations to skip or duplicate results.Proposed fix (version-prefix encoded tokens)
function encodeContinuationToken(tokenInfo: ContinuationToken) { const result: string[] = [] for (const [k, v] of Object.entries(CONTINUATION_TOKEN_PART_MAP)) { const value = tokenInfo[v] if (value) { - result.push(`${k}:${encodeURIComponent(value)}`) + result.push(`${k}:e:${encodeURIComponent(value)}`) } } return Buffer.from(result.join('\n')).toString('base64') } function decodeContinuationToken(token: string): ContinuationToken { @@ - let value = partMatch[2] - try { - value = decodeURIComponent(value) - } catch { - // Backward compatibility: previously cursor values were stored unescaped. - } + let value = partMatch[2] + if (value.startsWith('e:')) { + value = decodeURIComponent(value.slice(2)) + } result[CONTINUATION_TOKEN_PART_MAP[partMatch[1]]] = value } return result }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/object.ts` around lines 876 - 894, The decoder decodeContinuationToken currently always runs decodeURIComponent on every part which corrupts legacy unescaped tokens; update decodeContinuationToken to support a version prefix: detect and strip a token version prefix (e.g., "v1:" vs legacy/no-prefix), and only apply decodeURIComponent for the new version (v1) while preserving legacy behavior by leaving values raw (no decoding) when the token has no version prefix; keep using CONTINUATION_TOKEN_PART_MAP and the same part parsing logic, and ensure the function throws the same Invalid continuation token error for malformed parts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/internal/errors/codes.ts`:
- Line 327: The error message in ERRORS.InvalidKey uses encodeURIComponent(key)
which can throw for unpaired UTF-16 surrogates; update ERRORS.InvalidKey (where
the message is constructed) to produce a safe-encoded key: prefer using
key.toWellFormed() if available, otherwise normalize/replace unpaired surrogates
(or fall back to a try/catch around encodeURIComponent and sanitize the key to
replace invalid code units with U+FFFD) and then call encodeURIComponent on the
sanitized string before embedding it in the message so URIError cannot escape
and turn a 400 into a 500.
---
Outside diff comments:
In `@src/storage/object.ts`:
- Around line 876-894: The decoder decodeContinuationToken currently always runs
decodeURIComponent on every part which corrupts legacy unescaped tokens; update
decodeContinuationToken to support a version prefix: detect and strip a token
version prefix (e.g., "v1:" vs legacy/no-prefix), and only apply
decodeURIComponent for the new version (v1) while preserving legacy behavior by
leaving values raw (no decoding) when the token has no version prefix; keep
using CONTINUATION_TOKEN_PART_MAP and the same part parsing logic, and ensure
the function throws the same Invalid continuation token error for malformed
parts.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 46444ed6-2080-41d2-80b4-c88c86a791d1
📒 Files selected for processing (21)
migrations/tenant/57-unicode-object-names.sqlpackage.jsonsrc/http/plugins/xml.tssrc/internal/database/migrations/types.tssrc/internal/errors/codes.tssrc/scripts/migrations-types.tssrc/storage/backend/s3/adapter.tssrc/storage/database/knex.tssrc/storage/limits.tssrc/storage/object.tssrc/storage/protocols/s3/s3-handler.tssrc/test/bucket.test.tssrc/test/common.tssrc/test/limits.test.tssrc/test/object-list-v2.test.tssrc/test/object.test.tssrc/test/s3-adapter.test.tssrc/test/s3-protocol.test.tssrc/test/test-hygiene.test.tssrc/test/tus.test.tssrc/test/xml-plugin.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/test/xml-plugin.test.ts`:
- Around line 3-15: Add tests to ensure decodeXmlNumericEntities rejects
XML-forbidden but in-range numeric entities (so they remain un-decoded). In the
existing test suite for decodeXmlNumericEntities add negative cases asserting
inputs like '�' (and '�' hex/dec variants), surrogate halves such as
'�' (and decimal '�'), and noncharacters like '' (and
decimal '') are returned unchanged; reference the
decodeXmlNumericEntities function to locate the implementation to validate
behavior and ensure both hex (&#x...) and decimal (&#...;) forms are covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: af25095c-594e-4a8f-afaa-599e92b3eee5
📒 Files selected for processing (21)
migrations/tenant/57-unicode-object-names.sqlpackage.jsonsrc/http/plugins/xml.tssrc/internal/database/migrations/types.tssrc/internal/errors/codes.tssrc/scripts/migrations-types.tssrc/storage/backend/s3/adapter.tssrc/storage/database/knex.tssrc/storage/limits.tssrc/storage/object.tssrc/storage/protocols/s3/s3-handler.tssrc/test/bucket.test.tssrc/test/common.tssrc/test/limits.test.tssrc/test/object-list-v2.test.tssrc/test/object.test.tssrc/test/s3-adapter.test.tssrc/test/s3-protocol.test.tssrc/test/test-hygiene.test.tssrc/test/tus.test.tssrc/test/xml-plugin.test.ts
🚧 Files skipped from review as they are similar to previous changes (9)
- src/test/test-hygiene.test.ts
- src/test/object-list-v2.test.ts
- src/test/common.ts
- migrations/tenant/57-unicode-object-names.sql
- src/storage/object.ts
- src/storage/backend/s3/adapter.ts
- src/internal/errors/codes.ts
- src/http/plugins/xml.ts
- src/test/limits.test.ts
ff9670e to
bf8efff
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/protocols/s3/s3-handler.ts`:
- Around line 1448-1501: Change decodeLegacyContinuationToken and
decodeContinuationToken to accept a parameter name (e.g.
decodeContinuationToken(token: string, parameterName: string)) and replace all
plain throws like throw new Error('Invalid continuation token') with the
standardized request error from your ERRORS collection, passing the parameter
name (e.g. throw ERRORS.InvalidArgument(parameterName) or the appropriate
ERRORS.* factory used elsewhere). Update callers (e.g. where
KeyMarker/UploadIdMarker are decoded) to pass the parameter name string such as
'KeyMarker' or 'UploadIdMarker' when calling
decodeContinuationToken/decodeLegacyContinuationToken so malformed client tokens
produce a 4xx request error instead of a generic exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 773df860-6e8f-4ac9-bba0-96890cba39d8
📒 Files selected for processing (20)
migrations/tenant/57-unicode-object-names.sqlsrc/http/routes/object/getSignedObject.tssrc/http/routes/object/getSignedUploadURL.tssrc/http/routes/render/renderSignedImage.tssrc/http/routes/signed-url.tssrc/storage/backend/s3/adapter.tssrc/storage/events/lifecycle/webhook-filter.tssrc/storage/events/lifecycle/webhook.tssrc/storage/object.tssrc/storage/protocols/s3/s3-handler.tssrc/test/limits.test.tssrc/test/object-list-v2.test.tssrc/test/object.test.tssrc/test/render-routes.test.tssrc/test/s3-adapter.test.tssrc/test/s3-protocol.test.tssrc/test/signed-url-route.test.tssrc/test/tus.test.tssrc/test/webhook-filter.test.tssrc/test/webhooks.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/storage/backend/s3/adapter.ts
- src/test/limits.test.ts
This allows all Unicode characters that are also valid in XML 1.0 documents. See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html See: https://www.w3.org/TR/REC-xml/#charsets
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
What kind of change does this PR introduce?
feature
What is the current behavior?
Limited set of characters are permitted in object names.
What is the new behavior?
Unicode is supported across the board.
Additional context
#133