steipete · umutkeltek · May 29, 2026 · May 30, 2026 · May 30, 2026 · May 30, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,12 @@
 ### Added
 
 - Browser: `oracle session <id> --harvest` and `--live` now auto-recover when the original Chrome has been closed by relaunching the manual-login profile and reopening the saved conversation URL, then retrying the harvest against the recovered tab. Resolves the failure mode where a long GPT-5 Pro Extended response completed in the background after the CLI's 20-minute wall expired and the conversation was archived. Recovery URL selection prefers `browser.harvest.url` over `browser.runtime.tabUrl` and is gated by a shared ChatGPT-conversation-URL check (rejects home, project shell, and external URLs so the persistent profile can't be navigated to the wrong page from stale metadata). Opt out with `--no-recover` on the `session` subcommand.
+- MCP: add a dedicated `chatgpt_image` tool plus `generateImage` / `outputPath` support in `consult` so agent callers can trigger the ChatGPT image-aware wait/download path used by CLI `--generate-image`; saved image artifacts now come back in `structuredContent.images`. The `chatgpt_image` output reuses the typed `consult` output contract (`images` / `artifacts` / `resolved`) and its default output path carries a random suffix so concurrent agent calls cannot collide.
+
+### Security
+
+- MCP: constrain agent-supplied `generateImage` / `outputPath` to the Oracle home directory (`ORACLE_HOME_DIR`) by default so an MCP caller cannot write generated images or saved responses to arbitrary host paths. `..` traversal is rejected, and the boundary check resolves symlinks in the existing path prefix (via `realpath`) so a symlinked parent under the Oracle home cannot smuggle a write outside it. Set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to opt into external output paths as an explicit decision. CLI `--generate-image` / `--output` are unaffected.
+- MCP: `chatgpt_image` / `consult` image output fails closed when a remote browser service is configured (`ORACLE_REMOTE_HOST`), since the remote executor does not transfer image artifacts back and the `structuredContent.images` contract could not be fulfilled. The run is rejected with a clear error instead of silently returning no images.
 
 ## 0.13.0 — 2026-05-22
 

diff --git a/README.md b/README.md
@@ -125,6 +125,7 @@ Engine auto-picks API when `OPENAI_API_KEY` is set, otherwise browser; browser i
 - Browser support: stable on macOS; works on Linux (add `--browser-chrome-path/--browser-cookie-path` when needed) and Windows (manual-login or inline cookies recommended when app-bound cookies block decryption).
 - Remote browser service: `oracle serve` on a signed-in host; clients use `--remote-host/--remote-token`.
 - Browser artifacts: browser sessions save `transcript.md` and generated artifacts under `~/.oracle/sessions/<id>/artifacts/`. Deep Research saves `deep-research-report.md` when the report surface is captured; ChatGPT-generated images are downloaded with the active browser cookies when image URLs are present.
+- MCP image agents: use the `chatgpt_image` tool for the easiest path, or pass `generateImage: "/path/out.png"` to `consult` with `engine: "browser"`; saved paths come back in `structuredContent.images`.
 - Browser archiving: by default, successful non-project, non-Deep-Research, non-multi-turn ChatGPT one-shots are archived after local artifacts are saved. Use `--browser-archive never` to disable or `--browser-archive always` to force archiving after a successful browser run. Archived chats remain manageable in ChatGPT.
 - Conversation mode guidance: use one-shot browser runs for narrow bug reports or quick file-set reviews; use explicit browser follow-ups for ambiguous architecture/product tradeoffs where a challenge pass and final decision are valuable; use Deep Research for broad public-web questions that need citations. Oracle never invents follow-ups automatically.
 - Project Sources: `oracle project-sources list|add --chatgpt-url <project-url>` manages the Project Sources tab in ChatGPT browser mode. v1 is append-only (`list`, `add`, `--dry-run`) so agents can share explicit project context without deleting or replacing user sources.

diff --git a/docs/browser-mode.md b/docs/browser-mode.md
@@ -238,6 +238,8 @@ oracle --engine browser \
 
 If ChatGPT returns multiple images, the first image saves to the requested path and the rest save as numbered siblings. Without `--generate-image`, Oracle writes images to the session `artifacts/` directory.
 
+MCP agents should prefer the `chatgpt_image` tool. It wraps the same behavior with a smaller input shape, uploads reference files by default, and returns saved files in `structuredContent.images`. Advanced callers can still pass `generateImage` to `consult` directly.
+
 ### Manual login mode (persistent profile, no cookie copy)
 
 Use `--browser-manual-login` when cookie decrypt is blocked (e.g., Windows app-bound cookies) or you prefer to sign in explicitly. You can also make it the default via `browser.manualLogin` in `~/.oracle/config.json`.

diff --git a/docs/mcp.md b/docs/mcp.md
@@ -8,22 +8,54 @@ Claude Code can call `oracle-mcp` and ask a subscription-backed ChatGPT browser
 
 ## Tools
 
+### `chatgpt_image`
+
+- Inputs: `prompt` (required), `files?: string[]` for reference images/assets, `outputPath?: string`, `aspectRatio?: string`, `model?: string`, plus browser controls such as `browserThinkingTime`, `browserModelLabel`, `browserModelStrategy`, `browserArchive`, `browserKeepBrowser`, and `dryRun`.
+- Behavior: convenience wrapper for ChatGPT browser image generation. It forces `engine:"browser"`, sets `generateImage` for the existing image-aware wait/download path, and defaults `browserAttachments:"always"` when files are provided so reference images are uploaded instead of pasted.
+- Output: returns the normal session metadata plus `requestedOutputPath` and `structuredContent.images[]` with saved image paths and ChatGPT file metadata when available. If `outputPath` is omitted, Oracle picks a unique file under `ORACLE_HOME_DIR/generated/`.
+- Output path safety: agent-supplied `outputPath` must resolve under `ORACLE_HOME_DIR` by default; paths outside it (`..` traversal, and symlinked parents that escape the home — resolved via `realpath`) are rejected. Set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to allow writing elsewhere as an explicit decision. Omit `outputPath` to use the safe default.
+- Local browser only: image output is unsupported when a remote browser service is configured (`ORACLE_REMOTE_HOST`); the image would be written on the remote host and not transferred back, so `chatgpt_image`/`consult` image runs fail closed with a clear error rather than returning empty `structuredContent.images`. Run on the local browser to generate images.
+
+```json
+{
+  "prompt": "Create a 9:16 App Store screenshot background for a focus timer.",
+  "files": ["./reference-screen.png"],
+  "aspectRatio": "9:16"
+}
+```
+
 ### `consult`
 
 - Inputs: `prompt` (required), `files?: string[]` (globs), `model?: string` (defaults to CLI), `engine?: "api" | "browser"` (optional; Oracle follows CLI defaults: `ORACLE_ENGINE` and the effective config first, then API when `OPENAI_API_KEY` is set, otherwise browser), `slug?: string`.
 - Presets: `preset?: "chatgpt-pro-heavy"` applies browser mode + current Pro model alias + extended thinking, unless the request overrides those fields.
-- Browser-only extras: `browserAttachments?: "auto"|"never"|"always"`, `browserBundleFiles?: boolean`, `browserBundleFormat?: "text"|"zip"`, `browserThinkingTime?: "light"|"standard"|"extended"|"heavy"`, `browserResearchMode?: "deep"`, `browserFollowUps?: string[]`, `browserArchive?: "auto"|"always"|"never"`, `browserKeepBrowser?: boolean`, `browserModelLabel?: string`, `browserModelStrategy?: "select"|"current"|"ignore"`.
+- Browser-only extras: `browserAttachments?: "auto"|"never"|"always"`, `browserBundleFiles?: boolean`, `browserBundleFormat?: "text"|"zip"`, `browserThinkingTime?: "light"|"standard"|"extended"|"heavy"`, `browserResearchMode?: "deep"`, `browserFollowUps?: string[]`, `browserArchive?: "auto"|"always"|"never"`, `browserKeepBrowser?: boolean`, `browserModelLabel?: string`, `browserModelStrategy?: "select"|"current"|"ignore"`, `generateImage?: string`, `outputPath?: string`.
 - Dry runs: set `dryRun: true` to preview the resolved request without creating a session or touching the browser.
 - Behavior: starts a session, runs it with the chosen engine, returns final output + metadata. Background/foreground follows the CLI (e.g., GPT‑5 Pro detaches by default). If API mode fails because `OPENAI_API_KEY` is missing and you have ChatGPT Pro, retry with `engine: "browser"` or `preset: "chatgpt-pro-heavy"` to use your signed-in ChatGPT session instead of an API key.
 - Logging: emits MCP logs (`info` per line, `debug` for streamed chunks with byte sizes). If browser prerequisites are missing, returns an error payload instead of running.
 - Research mode: set `browserResearchMode:"deep"` for broad public-web research and cited reports. Use normal browser runs with `gpt-5.5-pro` + `browserThinkingTime:"extended"` for Pro Extended code review, or `gpt-5.5` + `browserThinkingTime:"heavy"` when you explicitly want Thinking Heavy.
 - Multi-turn consults: set `browserFollowUps:["Challenge your recommendation", "Give the final decision"]` to keep one ChatGPT browser conversation open and ask sequential follow-up prompts. Use one-shot calls for narrow bugs and exact file-set reviews; use multi-turn for ambiguous architecture/product decisions where a challenge pass and final recommendation are useful; use Deep Research for broad public-web work with citations. Oracle never invents follow-ups automatically.
 - Archiving: set `browserArchive:"auto"|"always"|"never"` to control ChatGPT conversation cleanup. `auto` archives only successful browser one-shots after local artifacts are saved, and skips project, Deep Research, multi-turn, failed, and incomplete sessions.
+- ChatGPT image generation: set `engine:"browser"` and `generateImage` to a path under `ORACLE_HOME_DIR` to use the same image-aware wait/download path as CLI `--generate-image`. Saved files are returned in `structuredContent.images` and recorded as session artifacts; multiple images save as numbered siblings. Agent-supplied `generateImage` / `outputPath` are constrained to `ORACLE_HOME_DIR` by default (set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to allow external paths).
 
 #### Long browser consults from agents
 
 Browser-backed GPT-5.5 Pro consults can legitimately run for many minutes. Some MCP clients show little progress while a tool call is active, so agents should treat a long Oracle call as a running browser job, not as a failed step. Start with `dryRun:true` when configuring a new agent, prefer `preset:"chatgpt-pro-heavy"` or `engine:"browser"` explicitly, and use the shared session store (`sessions`, `oracle status`, or `oracle session <id>`) before retrying a prompt. If the browser control plan says Oracle will launch visible Chrome, use attach/remote Chrome when the operator is actively using the computer.
 
+#### ChatGPT images from agents
+
+For generated images, pass an explicit `generateImage` path. That opt-in is important because it switches the browser wait loop to watch for ChatGPT image artifacts instead of only assistant text. The path must resolve under `ORACLE_HOME_DIR` unless `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` is set.
+
+```json
+{
+  "engine": "browser",
+  "model": "gpt-5.5-pro",
+  "prompt": "Create a 9:16 App Store screenshot background for a focus timer.",
+  "generateImage": "${ORACLE_HOME_DIR}/generated/focus-timer-bg.png"
+}
+```
+
+The MCP response includes `structuredContent.images[]` with the saved file path, MIME type, size, and ChatGPT file metadata when available.
+
 ### `sessions`
 
 - Inputs: `{id?, hours?, limit?, includeAll?, detail?}` mirroring `oracle status` / `oracle session`.

diff --git a/src/browser/chatgptImages.ts b/src/browser/chatgptImages.ts
@@ -1,5 +1,6 @@
 import fs from "node:fs/promises";
 import path from "node:path";
+import { randomUUID } from "node:crypto";
 import type {
   BrowserGeneratedImage,
   BrowserLogger,
@@ -203,9 +204,10 @@ function resolveDefaultGeneratedImagePath(
   sessionId?: string,
 ): string {
   const primary = images[0];
-  const stemSource =
-    primary?.fileId || primary?.alt || primary?.url || `generated-${Date.now().toString(36)}`;
-  const stem = sanitizeGeneratedImageStem(stemSource) || `generated-${Date.now().toString(36)}`;
+  // Random fallback token keeps concurrent session-less saves from colliding.
+  const uniqueFallback = `generated-${Date.now().toString(36)}-${randomUUID().slice(0, 8)}`;
+  const stemSource = primary?.fileId || primary?.alt || primary?.url || uniqueFallback;
+  const stem = sanitizeGeneratedImageStem(stemSource) || uniqueFallback;
   const baseDir = sessionId
     ? resolveSessionArtifactsDir(sessionId)
     : path.join(getOracleHomeDir(), ".temp");

diff --git a/src/mcp/server.ts b/src/mcp/server.ts
@@ -5,6 +5,7 @@ import { pathToFileURL } from "node:url";
 import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
 import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
 import { getCliVersion } from "../version.js";
+import { registerChatGptImageTool } from "./tools/chatgptImage.js";
 import { registerConsultTool } from "./tools/consult.js";
 import { registerProjectSourcesTool } from "./tools/projectSources.js";
 import { registerSessionsTool } from "./tools/sessions.js";
@@ -24,6 +25,7 @@ export async function startMcpServer(): Promise<void> {
   );
 
   registerConsultTool(server);
+  registerChatGptImageTool(server);
   registerProjectSourcesTool(server);
   registerSessionsTool(server);
   registerSessionResources(server);

diff --git a/src/mcp/tools/chatgptImage.ts b/src/mcp/tools/chatgptImage.ts
@@ -0,0 +1,142 @@
+import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
+import type { CallToolResult } from "@modelcontextprotocol/sdk/types.js";
+import path from "node:path";
+import { randomUUID } from "node:crypto";
+import { z } from "zod";
+import { getOracleHomeDir } from "../../oracleHome.js";
+import type { ConsultInput } from "../types.js";
+import { consultOutputShape, runConsultTool } from "./consult.js";
+
+const chatGptImageInputShape = {
+  prompt: z.string().min(1, "Prompt is required.").describe("Image generation prompt."),
+  files: z
+    .array(z.string())
+    .default([])
+    .describe("Optional reference image/file paths or globs to upload to ChatGPT."),
+  outputPath: z
+    .string()
+    .optional()
+    .describe(
+      "Where to save the first generated image. Defaults to a unique file under ORACLE_HOME_DIR/generated/.",
+    ),
+  aspectRatio: z
+    .string()
+    .optional()
+    .describe('Optional requested image aspect ratio, e.g. "1:1", "9:16", or "16:9".'),
+  model: z
+    .string()
+    .optional()
+    .describe("Optional ChatGPT/browser model label or alias. Defaults follow Oracle config."),
+  browserModelLabel: z.string().optional().describe("Explicit ChatGPT UI model label to select."),
+  browserAttachments: z
+    .enum(["auto", "never", "always"])
+    .optional()
+    .describe(
+      'How to deliver files. Defaults to "always" when files are present so reference images are uploaded.',
+    ),
+  browserThinkingTime: z
+    .enum(["light", "standard", "extended", "heavy"])
+    .optional()
+    .describe("Set ChatGPT thinking time when supported by the chosen model."),
+  browserModelStrategy: z
+    .enum(["select", "current", "ignore"])
+    .optional()
+    .describe("Model picker strategy. Mirrors the consult tool and CLI browser flag."),
+  browserArchive: z
+    .enum(["auto", "always", "never"])
+    .optional()
+    .describe("Archive completed ChatGPT conversations after local artifacts are saved."),
+  browserKeepBrowser: z
+    .boolean()
+    .optional()
+    .describe("Keep Chrome running after completion for debugging."),
+  dryRun: z
+    .boolean()
+    .optional()
+    .describe("Preview the resolved image run without touching the browser."),
+  slug: z.string().optional().describe("Optional human-friendly session id."),
+} satisfies z.ZodRawShape;
+
+const chatGptImageOutputShape = {
+  // Mirror the consult output contract so structuredContent stays consistent
+  // (images/artifacts/resolved are typed by the shared consult shapes), plus the
+  // image-specific echo of the requested path.
+  ...consultOutputShape,
+  requestedOutputPath: z.string(),
+} satisfies z.ZodRawShape;
+
+const chatGptImageInputSchema = z.object(chatGptImageInputShape).strict();
+
+export type ChatGptImageInput = z.infer<typeof chatGptImageInputSchema>;
+
+function resolveDefaultImageOutputPath(): string {
+  // Include a random token so concurrent agent calls in the same millisecond do
+  // not resolve to the same default path and overwrite each other.
+  const unique = `${Date.now().toString(36)}-${randomUUID().slice(0, 8)}`;
+  return path.join(getOracleHomeDir(), "generated", `chatgpt-image-${unique}.png`);
+}
+
+function appendAspectRatio(prompt: string, aspectRatio?: string): string {
+  const requestedAspectRatio = aspectRatio?.trim();
+  if (!requestedAspectRatio) {
+    return prompt.trim();
+  }
+  return `${prompt.trim()}\n\nCreate the image with aspect ratio ${requestedAspectRatio}.`;
+}
+
+export function buildChatGptImageConsultInput(input: ChatGptImageInput): ConsultInput {
+  const files = input.files ?? [];
+  const outputPath = input.outputPath?.trim() || resolveDefaultImageOutputPath();
+  const browserAttachments =
+    input.browserAttachments ?? (files.length > 0 ? ("always" as const) : undefined);
+  return {
+    prompt: appendAspectRatio(input.prompt, input.aspectRatio),
+    files,
+    model: input.model,
+    engine: "browser",
+    browserModelLabel: input.browserModelLabel,
+    browserAttachments,
+    browserThinkingTime: input.browserThinkingTime,
+    browserModelStrategy: input.browserModelStrategy,
+    browserArchive: input.browserArchive,
+    browserKeepBrowser: input.browserKeepBrowser,
+    generateImage: outputPath,
+    dryRun: input.dryRun,
+    slug: input.slug,
+  };
+}
+
+export function registerChatGptImageTool(server: McpServer): void {
+  server.registerTool(
+    "chatgpt_image",
+    {
+      title: "Generate an image with ChatGPT",
+      description:
+        "Agent-friendly wrapper for ChatGPT browser image generation. It selects browser mode, enables the image-aware wait/download path, uploads reference files when provided, and returns saved image paths in structuredContent.images.",
+      inputSchema: chatGptImageInputShape,
+      outputSchema: chatGptImageOutputShape,
+    },
+    async (input: unknown): Promise<CallToolResult> => {
+      const textContent = (text: string) => [{ type: "text" as const, text }];
+      let parsed;
+      try {
+        parsed = chatGptImageInputSchema.parse(input);
+      } catch (error) {
+        return {
+          isError: true,
+          content: textContent(error instanceof Error ? error.message : String(error)),
+        };
+      }
+      const consultInput = buildChatGptImageConsultInput(parsed);
+      const result = await runConsultTool(consultInput, { server: server.server });
+      const structuredContent = {
+        ...(result.structuredContent ?? {}),
+        requestedOutputPath: consultInput.generateImage,
+      };
+      return {
+        ...result,
+        structuredContent,
+      };
+    },
+  );
+}