Skip to content

[codex] Make ChatGPT image generation easy for MCP agents#229

Open
umutkeltek wants to merge 4 commits into
steipete:mainfrom
umutkeltek:codex/mcp-chatgpt-image-artifacts
Open

[codex] Make ChatGPT image generation easy for MCP agents#229
umutkeltek wants to merge 4 commits into
steipete:mainfrom
umutkeltek:codex/mcp-chatgpt-image-artifacts

Conversation

@umutkeltek
Copy link
Copy Markdown

@umutkeltek umutkeltek commented May 29, 2026

Summary

  • Add a dedicated chatgpt_image MCP tool that wraps ChatGPT browser image generation with a small agent-friendly input shape.
  • Add generateImage and outputPath to the general consult tool so advanced callers can still trigger the CLI --generate-image image-aware wait/download path directly.
  • Return saved session artifacts, including a convenience images array with paths and ChatGPT file metadata, in structured MCP output.
  • Document the new agent image workflow in MCP/browser docs, README, and changelog.

Verification

  • pnpm exec vitest run tests/mcp*.test.ts tests/mcp/**/*.test.ts
  • pnpm exec vitest run tests/mcp/chatgptImage.test.ts tests/mcp/utils.test.ts tests/mcp/consult.test.ts
  • pnpm exec oxfmt --check CHANGELOG.md README.md docs/browser-mode.md docs/mcp.md src/mcp/server.ts src/mcp/tools/consult.ts src/mcp/tools/chatgptImage.ts src/mcp/types.ts src/mcp/utils.ts tests/mcp/chatgptImage.test.ts tests/mcp/consult.test.ts tests/mcp/utils.test.ts
  • pnpm exec oxlint src/mcp/server.ts src/mcp/tools/consult.ts src/mcp/tools/chatgptImage.ts src/mcp/types.ts src/mcp/utils.ts tests/mcp/chatgptImage.test.ts tests/mcp/consult.test.ts tests/mcp/utils.test.ts
  • pnpm exec tsgo --noEmit fails on existing repo-wide issues unrelated to this patch: scripts/check.ts Bun/esbuild target typing and tests/browser/chatgptImages.test.ts fetch mock typing under Node 26.

Review follow-ups (commits cfa4f047, 4ee828de)

Security hardening

  • Constrained agent-supplied generateImage / outputPath to ORACLE_HOME_DIR by default (.. traversal rejected); ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1 opts into external paths. CLI --generate-image / --output unaffected.
  • Symlink hardening (P1): the boundary check now resolves symlinks in the existing path prefix via realpath (deepest existing ancestor + non-existent remainder), so a symlinked parent under the Oracle home (e.g. ~/.oracle/generated -> /tmp/evil) can no longer smuggle a write outside the boundary. Regression tests cover symlink escape for both fields (rejected by default), a within-home symlink (allowed), and an escape allowed under the explicit opt-in. The escape tests were verified to fail against the prior lexical check.

Correctness / contract

  • Default image path carries a random suffix (no collision under parallel agent calls).
  • chatgpt_image output reuses the typed consult output shape (images / artifacts / resolved).

Live behavior proof (P1) — real chatgpt_image MCP run (redacted), GPT-5.5 browser session, image downloaded via cookie-backed URL and saved under ORACLE_HOME_DIR/generated/:

// structuredContent (redacted)
{
  "status": "completed",
  "models": [{ "model": "gpt-5.5", "status": "completed", "usage": { "totalTokens": 81 } }],
  "images": [{
    "kind": "image",
    "path": "~/.oracle/generated/pr229-proof.png",
    "mimeType": "image/png",
    "sizeBytes": 962854,
    "width": 1254,
    "height": 1254,
    "fileId": "file_…<redacted>",
    "sourceUrl": "https://chatgpt.com/backend-api/estuary/content?id=file_…&sig=<redacted>"
  }],
  "artifacts": [
    { "kind": "image", "path": "~/.oracle/generated/pr229-proof.png", "sizeBytes": 962854 },
    { "kind": "transcript", "path": "~/.oracle/sessions/<id>/artifacts/transcript.md" }
  ],
  "requestedOutputPath": "~/.oracle/generated/pr229-proof.png"
}

Saved file verified on disk: PNG image data, 1254 x 1254, 8-bit/color RGB (963 KB).

Verification: tests/mcp 41 passed; oxfmt --check clean; oxlint clean on the listed files; no new typecheck errors in the changed files (the 3 pre-existing tsgo errors remain in untouched files).


Update — remote image-output handling (8f221a7a)

Addresses the re-review [P1]: image output was not threaded through the remote browser-service protocol, so chatgpt_image/consult image runs could not fulfill the structuredContent.images contract when ORACLE_REMOTE_HOST is configured.

Fail closed: when a remote host is configured and image output (generateImage/outputPath) is requested, the run is rejected with a clear error pointing at the local-browser path, instead of silently returning no images or proceeding into a remote run that can't deliver artifacts. Local image generation is unaffected (proven end-to-end above).

  • Regression test (remote host + token + generateImageisError), verified to fail against the pre-fix source.
  • Documented in docs/mcp.md + changelog.
  • Full MCP suite 42 passed; format / lint / typecheck (changed files) clean.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 29, 2026

Codex review: needs maintainer review before merge. Reviewed May 30, 2026, 7:30 AM ET / 11:30 UTC.

Summary
The PR adds an MCP chatgpt_image tool, exposes image output fields on consult, returns saved image artifacts, hardens MCP output-path containment, fails closed for remote browser-service image output, and updates docs/tests/changelog.

Reproducibility: not applicable. this is a feature PR rather than a bug report. The PR body provides a redacted live MCP chatgpt_image run, and the source/tests show the intended paths.

Review metrics: 2 noteworthy metrics.

  • Changed Surface: 13 files, +1116/-266. The diff spans MCP API, browser image artifact handling, docs, changelog, and focused tests, so maintainers should review it as a feature plus boundary change.
  • Public MCP Tools: 1 added, 1 changed. The PR adds chatgpt_image and expands consult, which changes the public MCP surface.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P1] The branch introduces an MCP-controlled host file-write surface; it is constrained to ORACLE_HOME_DIR by default with symlink-aware checks, but maintainers still need to accept that security boundary and the explicit external-output opt-in.
  • [P1] Remote browser-service users will receive a clear error for image output until artifact transfer is designed; that is safer than silent loss, but it is still a maintainer-visible product choice for the new feature.

Maintainer options:

  1. Accept The Bounded MCP Write Surface (recommended)
    Merge after maintainers accept ORACLE_HOME_DIR default containment, symlink-aware validation, explicit external-output opt-in, and remote fail-closed behavior as the intended MCP policy.
  2. Tighten The Output Policy First
    If maintainers want a smaller boundary, require a dedicated configured output directory or remove the external-output opt-in before merge.

Next step before merge

  • [P2] No narrow automation repair remains; the next step is maintainer judgment on the MCP file-write boundary and remote image-output policy.

Security
Cleared: No concrete security defect remains after the path-containment, symlink, opt-in external-output, and remote fail-closed changes; the new boundary still needs maintainer acceptance.

Review details

Best possible solution:

Merge the bounded local-browser MCP image path if maintainers accept the security boundary; otherwise require a stricter output-directory policy or a remote artifact-transfer design before merge.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a feature PR rather than a bug report. The PR body provides a redacted live MCP chatgpt_image run, and the source/tests show the intended paths.

Is this the best way to solve the issue?

Yes, with maintainer acceptance of the boundary: the implementation reuses the existing ChatGPT image wait/download path, constrains MCP-controlled paths, and fails closed for unsupported remote-service image output.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6019a199e44c.

Label changes

Label changes:

  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes redacted live MCP output showing a completed chatgpt_image run with populated image metadata and an on-disk PNG verification.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P2: This is a bounded MCP/browser feature with normal maintainer-review urgency and limited blast radius.
  • merge-risk: 🚨 security-boundary: The PR gives MCP callers a new way to request host file writes for generated images, guarded by containment and an explicit opt-out.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes redacted live MCP output showing a completed chatgpt_image run with populated image metadata and an on-disk PNG verification.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted live MCP output showing a completed chatgpt_image run with populated image metadata and an on-disk PNG verification.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was present and fully read; its browser/MCP smoke guidance and changelog grouping note were relevant to this review. (AGENTS.md:1, 6019a199e44c)
  • New MCP image wrapper: chatgpt_image forces browser mode, maps the image output path to generateImage, defaults reference files to upload, and delegates to the shared consult runner. (src/mcp/tools/chatgptImage.ts:87, 8f221a7a4c3f)
  • MCP output containment: MCP-supplied generateImage and outputPath are resolved through existing symlink ancestors and rejected unless the real target stays under the Oracle home, unless explicit external output is enabled. (src/mcp/utils.ts:65, 8f221a7a4c3f)
  • Remote fail-closed behavior: When a remote browser service is configured and image output is requested, the MCP consult path returns an error instead of silently running without transferable artifacts. (src/mcp/tools/consult.ts:645, 8f221a7a4c3f)
  • Regression coverage: Tests cover remote-service fail-closed behavior and path rejection for traversal and symlink escapes, including both generateImage and outputPath. (tests/mcp/consult.test.ts:361, 8f221a7a4c3f)
  • Current-main provenance: Current main blame for the central MCP, browser-image, and remote-service files points to the v0.13.0 release commit as the available local history root for this shallow checkout. (src/mcp/tools/consult.ts:1, abb7c9a7d9c8)

Likely related people:

  • Peter Steinberger: Current-main blame for the central MCP and browser image files points to the v0.13.0 release commit by this author, and the local shortlog shows dominant activity in the affected area. (role: feature history owner / release integrator; confidence: high; commits: abb7c9a7d9c8; files: src/mcp/tools/consult.ts, src/mcp/utils.ts, src/browser/chatgptImages.ts)
  • pdurlej: Release notes credit this handle on closely related MCP/browser consult, dry-run, model-strategy, follow-up, and generated-image artifact work; the PR timeline also shows them being mentioned on this review thread. (role: adjacent feature contributor; confidence: medium; commits: abb7c9a7d9c8; files: CHANGELOG.md, docs/mcp.md, src/mcp/tools/consult.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. labels May 29, 2026
@umutkeltek umutkeltek force-pushed the codex/mcp-chatgpt-image-artifacts branch from d54ef7e to d4df728 Compare May 29, 2026 18:06
@umutkeltek umutkeltek changed the title [codex] Expose ChatGPT image outputs to MCP agents [codex] Make ChatGPT image generation easy for MCP agents May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 29, 2026
@umutkeltek umutkeltek marked this pull request as ready for review May 29, 2026 21:22
Address review feedback on the chatgpt_image / consult image-output work.

Security: constrain agent-supplied generateImage / outputPath to the Oracle
home directory by default (resolveMcpOutputPath in src/mcp/utils.ts). MCP
clients are less trusted than the CLI user, so an agent must not be able to
write generated images or saved responses to arbitrary host paths; `..`
traversal is rejected by the resolved-prefix check. Opt into external paths
with ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1. A rejected path now returns a clean
MCP error instead of throwing. CLI --generate-image / --output are unaffected.

Correctness: the default image output path (and the browser-side session-less
fallback) now carry a random suffix so concurrent agent calls in the same
millisecond cannot resolve to the same path and overwrite each other.

Contract: chatgpt_image reuses the typed consult output shape
(images / artifacts / resolved) instead of loose z.record(...any), so the
advertised structuredContent.images shape is actually declared.

Tests: allow/reject path coverage for mapConsultToRunOptions, the consult MCP
handler, and the chatgpt_image handler; default-path uniqueness assertion.
@umutkeltek
Copy link
Copy Markdown
Author

Did an independent pass on this and pushed a follow-up commit (cfa4f047) to address the review. Summary of findings + what changed.

Findings

The bot's two P1s (output-path security boundary, needs real proof) are fair. Two refinements after reading the actual write path (chatgptImages.ts, sessionRunner.ts):

  1. outputPath is the stronger write primitive, not the image bytes. The image saver partly self-limits — resolveSiblingImagePath forces a content-type extension on extensionless paths, and the bytes are ChatGPT-rendered image data (not attacker-chosen). But outputPath writes the model's text response verbatim (sessionRunner.ts:991-1007), text is fully caller-steerable, and its only guard is "not inside the sessions dir" — a state-corruption guard, not a security boundary. It mkdir -ps the parent, overwrites, and on EPERM even falls back to another path. Containment has to cover both fields.
  2. Default image path collides under parallelism (bot missed this). resolveDefaultImageOutputPath was chatgpt-image-${Date.now().toString(36)}.png. Two concurrent chatgpt_image calls in the same millisecond resolve to the same path and overwrite each other — wrong default for a tool built for parallel agents.
  3. Output schema didn't declare its advertised shape. images/artifacts/resolved were z.record(z.string(), z.any()), so the "returns saved paths in structuredContent.images" contract wasn't actually typed.

Changes in cfa4f047

  • Containment (both fields): resolveMcpOutputPath constrains agent-supplied generateImage / outputPath to ORACLE_HOME_DIR by default; .. traversal is rejected by the resolved-prefix check. Opt out with ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1. Rejected paths return a clean MCP error (not a throw). CLI --generate-image / --output are untouched — the boundary is MCP-only.
  • Collision fix: default image path (and the browser-side session-less fallback) now carry a random suffix.
  • Contract: chatgpt_image output reuses the typed consult output shape (images / artifacts / resolved) so the structuredContent shape is consistent and declared.
  • Tests (the bot's P1 rank-up move): allow/reject path coverage for mapConsultToRunOptions, the consult MCP handler, and the chatgpt_image handler, plus a default-path-uniqueness assertion. Full MCP + image suites green (43 tests); format/lint clean.

Still open

  • Residual symlink escape: path.resolve doesn't resolve symlinks, so a symlinked dir under ORACLE_HOME_DIR pointing outside could bypass the prefix check. Left as a known follow-up (would need realpath on the existing parent).
  • Live proof: the cheap dryRun:true path now exercises resolution + containment in tests, but a single redacted real ChatGPT image run (cookie download → populated structuredContent.images[]) is still worth attaching before merge — automation can't supply that.

Address review: the boundary check in resolveMcpOutputPath was lexical
(path.resolve prefix), so a pre-existing symlinked directory under
ORACLE_HOME_DIR (e.g. ~/.oracle/generated -> /tmp/evil) could pass the
string-prefix test while the actual image/text write landed outside the
documented MCP containment boundary.

Resolve the target through symlinks before the boundary check: realpath the
deepest existing ancestor of the requested path and re-append the not-yet-
created remainder, then compare against the realpath of the Oracle home. A
symlinked parent that escapes the home is now rejected unless
ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1 is set.

Tests: symlink-escape regression for BOTH generateImage and outputPath
(rejected by default), a within-home symlink (allowed), and an escape allowed
under the explicit opt-in. Verified the escape tests fail against the prior
lexical check and pass after. Docs/changelog note the symlink-safe containment.
@umutkeltek
Copy link
Copy Markdown
Author

Both remaining P1s addressed.

Symlink hardening (4ee828de)resolveMcpOutputPath no longer relies on a lexical prefix check. It resolves symlinks in the existing path prefix via realpath (deepest existing ancestor + the not-yet-created remainder) and compares against the realpath of the Oracle home, so a symlinked parent under ORACLE_HOME_DIR (e.g. generated -> /tmp/evil) can't smuggle a write outside the boundary. Regression tests cover symlink escape for both generateImage and outputPath (rejected by default), a within-home symlink (allowed), and an escape allowed under ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1. Verified the escape tests fail against the prior lexical check and pass after.

Live behavior proof — ran a real chatgpt_image MCP call end-to-end (redacted output in the PR body): GPT-5.5 browser session → image generated → cookie-backed download → saved under ORACLE_HOME_DIR/generated/ with a populated structuredContent.images[] (path, mimeType, sizeBytes: 962854, 1254x1254, fileId). File verified on disk as PNG image data, 1254 x 1254.

tests/mcp 41 passed; oxfmt/oxlint clean on the listed files; PR body updated with details.

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 30, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 30, 2026
@umutkeltek
Copy link
Copy Markdown
Author

No code changes since the last review cleared the security defect (paths confined under the realpathed Oracle home, symlink escapes tested, external writes gated behind ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT), with live chatgpt_image proof in the body. Requesting a fresh pass to confirm the current verdict.

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 30, 2026

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 30, 2026
Review found that consult/chatgpt_image image output was not threaded through
the remote browser-service protocol: when ORACLE_REMOTE_HOST is configured the
generated image is written on the remote host and never transferred back, so the
promised structuredContent.images contract could not be fulfilled (the run would
silently return no images, or proceed into a remote run that can't deliver them).

Reject image output explicitly when a remote host is configured, with a clear
error pointing the caller at the local-browser path. Local image generation is
unaffected (verified end-to-end). Adds a regression test (remote host + token +
generateImage -> isError) and documents the limitation in docs/mcp.md + changelog.
@umutkeltek
Copy link
Copy Markdown
Author

Closed the remote image-output [P1] from the last pass.

8f221a7a — image output now fails closed when a remote browser service is configured (ORACLE_REMOTE_HOST): the remote executor doesn't transfer image artifacts back, so rather than silently returning empty structuredContent.images, the run is rejected with a clear error pointing at the local-browser path. Local image generation is unchanged (proven live in the body). Regression test added (remote host + token + generateImageisError, verified to fail against the pre-fix source); documented in docs/mcp.md + changelog. Full MCP suite 42 passed; format/lint/typecheck clean.

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 30, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant