Skip to content
This repository was archived by the owner on Jun 29, 2026. It is now read-only.

fix(workspaces): canonicalize upload content types#59

Open
urjitc wants to merge 1 commit into
mainfrom
codex/propose-fix-for-stored-xss-vulnerability
Open

fix(workspaces): canonicalize upload content types#59
urjitc wants to merge 1 commit into
mainfrom
codex/propose-fix-for-stored-xss-vulnerability

Conversation

@urjitc

@urjitc urjitc commented Jun 29, 2026

Copy link
Copy Markdown
Member

Motivation

  • Prevent stored same-origin XSS by stopping persistence of attacker-controlled multipart Content-Type values for accepted workspace files and ensure stored MIME is canonical and safe.

Description

  • Change resolveWorkspaceFileContentType to ignore the raw uploaded contentType and instead canonicalize the stored MIME via the matched upload allowlist, falling back to the descriptor's canonical MIME or application/octet-stream in src/features/workspaces/model/workspace-file/policy.ts.
  • Add regression tests src/features/workspaces/model/workspace-file/policy.test.ts that assert a spoofed invoice.png with Content-Type: text/html resolves to image/png, and that shell extension selection remains aligned with canonical MIME.
  • Only the upload canonicalization logic and the test file were modified; the change preserves existing extraction/preview logic by keeping descriptor-driven canonical MIME selection.

Testing

  • Ran a small node verification script that asserts resolveWorkspaceFileContentType({ fileName: 'invoice.png', contentType: 'text/html' }) returns image/png, and it succeeded.
  • Added and ran unit tests with vitest (node environment config) for src/features/workspaces/model/workspace-file/policy.test.ts, and the new tests passed.
  • Noted that the repository default vitest pool (Cloudflare remote pool) fails in this environment due to wrangler authentication, and vp check surfaced unrelated, pre-existing TypeScript errors in scripts/thinkex-bulk-migration.ts (these are not caused by this change).

Codex Task


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.


Summary by cubic

Canonicalizes stored MIME types for workspace uploads by relying on the allowlisted format and ignoring raw Content-Type. Prevents stored same-origin XSS and keeps extensions aligned with the canonical MIME.

  • Bug Fixes
    • resolveWorkspaceFileContentType no longer uses the uploaded header; it selects MIME from the matched upload format or the descriptor fallback.
    • Added tests: spoofed invoice.png with text/html resolves to image/png, and JPEG uploads keep a jpg shell extension.

Written for commit c75c69e. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • Bug Fixes

    • Improved file upload type detection so files are now classified using the matched file format, even when the provided content type is inconsistent.
    • File content types are now normalized to the expected format for the detected file type, with safer fallback behavior when no match is found.
    • Shell file extensions now align more consistently with the resolved file type, such as using jpg for JPEG images.
  • Tests

    • Added coverage for file type and content type resolution edge cases.

@capy-ai

capy-ai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Capy auto-review is paused for this organization because the usage-cycle auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

resolveWorkspaceFileContentType drops its early-return path that passed through input.contentType directly. Content type is now always derived from the matched upload format's mime. A new test file covers spoofed-hint canonicalization and shell-extension alignment.

Workspace File Upload Policy

Layer / File(s) Summary
Content-type canonicalization fix and tests
src/features/workspaces/model/workspace-file/policy.ts, src/features/workspaces/model/workspace-file/policy.test.ts
Removes the six-line early-return in resolveWorkspaceFileContentType that returned input.contentType when present; adds two test cases asserting that a spoofed contentType hint is ignored and the result always matches the descriptor-derived MIME and extension.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit once sniffed a mismatched mime,
"image/png" wearing "text/html" — a crime!
So I clipped the shortcut, tossed it away,
Now the format decides what content we say.
Hop hop, the tests confirm we're aligned! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: workspace upload content types are now canonicalized.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/propose-fix-for-stored-xss-vulnerability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/features/workspaces/model/workspace-file/policy.test.ts (2)

11-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Test structure uses two different descriptor sources; consider using the resolved descriptor for stronger coverage.

The test resolves descriptor from resolveWorkspaceFileTypeFromHint (line 12) but then asserts with getWorkspaceUploadFamily("image") (line 20) instead of that descriptor. While both may yield the same family, using the resolved descriptor would more directly test the end-to-end spoofing scenario and prevent hidden regressions if hint resolution ever diverges from family lookup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/features/workspaces/model/workspace-file/policy.test.ts` around lines 11
- 25, The spoofed content-type test mixes two descriptor sources, which weakens
coverage. Update the assertion in workspace-file policy tests to use the
descriptor returned by resolveWorkspaceFileTypeFromHint for
resolveWorkspaceFileContentType, instead of calling
getWorkspaceUploadFamily("image"), so the test validates the full
hint-resolution flow and catches divergences between hint matching and family
lookup.

1-44: 🧹 Nitpick | 🔵 Trivial

CI pipeline failure is infrastructure-related, not a code defect.

The Vitest cloudflare-pool worker fails due to missing CLOUDFLARE_API_TOKEN. This is a CI environment configuration issue unrelated to the test logic. Ensure the secret is available in GitHub Actions before merge.

#!/bin/bash
# Verify CI environment has required secrets
gh api repos/:owner/:repo/actions/secrets | jq '.secrets[] | select(.name == "CLOUDFLARE_API_TOKEN")'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/features/workspaces/model/workspace-file/policy.test.ts` around lines 1 -
44, The failing Vitest run is caused by a missing CI secret rather than a test
or policy issue, so no changes are needed in getWorkspaceFileShellExtension,
getWorkspaceUploadFamily, resolveWorkspaceFileContentType, or
resolveWorkspaceFileTypeFromHint. Update the GitHub Actions environment to
provide CLOUDFLARE_API_TOKEN for the cloudflare-pool worker, and verify the
secret is available before rerunning the workspace-file policy tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/features/workspaces/model/workspace-file/policy.test.ts`:
- Around line 11-25: The spoofed content-type test mixes two descriptor sources,
which weakens coverage. Update the assertion in workspace-file policy tests to
use the descriptor returned by resolveWorkspaceFileTypeFromHint for
resolveWorkspaceFileContentType, instead of calling
getWorkspaceUploadFamily("image"), so the test validates the full
hint-resolution flow and catches divergences between hint matching and family
lookup.
- Around line 1-44: The failing Vitest run is caused by a missing CI secret
rather than a test or policy issue, so no changes are needed in
getWorkspaceFileShellExtension, getWorkspaceUploadFamily,
resolveWorkspaceFileContentType, or resolveWorkspaceFileTypeFromHint. Update the
GitHub Actions environment to provide CLOUDFLARE_API_TOKEN for the
cloudflare-pool worker, and verify the secret is available before rerunning the
workspace-file policy tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5dcc0aab-f3ed-419a-b8a4-78bd8c2deaff

📥 Commits

Reviewing files that changed from the base of the PR and between 6661c82 and ae68226.

📒 Files selected for processing (2)
  • src/features/workspaces/model/workspace-file/policy.test.ts
  • src/features/workspaces/model/workspace-file/policy.ts
💤 Files with no reviewable changes (1)
  • src/features/workspaces/model/workspace-file/policy.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown

Greptile Summary

This PR canonicalizes stored workspace upload content types. The main changes are:

  • Store MIME types from the matched upload allowlist instead of raw multipart Content-Type values.
  • Keep descriptor fallback behavior for accepted workspace file families.
  • Add tests for spoofed image upload headers and shell extension alignment.

Confidence Score: 5/5

The change is narrowly scoped to upload MIME canonicalization and adds focused regression coverage for the spoofed header case.

The modified policy behavior matches the intended allowlist-driven storage model, and no review comments were identified for the changed files.

T-Rex T-Rex Logs

What T-Rex did

  • T-Rex observed that before the change, resolveWorkspaceFileContentType returned raw attacker-supplied text/html for a descriptor image with fileName 'invoice.png' and contentType 'text/html', and also returned text/html for fallback cases.
  • T-Rex updated the return expression in src/features/workspaces/model/workspace-file/policy.ts:264 to normalize content types: spoofed PNG yields image/png, descriptors with image extensions yield image/png, and descriptors with no extensions yield application/octet-stream.
  • T-Rex verified that no contract mismatch was observed after the change.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (1): Last reviewed commit: "fix(workspaces): canonicalize upload con..." | Re-trigger Greptile

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae68226511

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

return (
resolveMatchedUploadFormat(input, input.descriptor)?.mime ??

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop honoring spoofed allowlisted MIME values

In the upload route, validation accepts either the filename or the multipart MIME and then passes the raw file.type through to this resolver. Because this call still delegates to resolveMatchedUploadFormat, which checks MIME before extension, a client can upload photo.png with Content-Type: image/jpeg and we will store image/jpeg and choose a .jpg shell path based on the header rather than the accepted .png filename. That leaves uploaded MIME choice attacker-controlled among allowlisted values, so the canonicalization does not fully remove spoofed Content-Type persistence.

Useful? React with 👍 / 👎.

@urjitc urjitc force-pushed the codex/propose-fix-for-stored-xss-vulnerability branch from ae68226 to c75c69e Compare June 29, 2026 03:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant