diff --git a/CHANGELOG.md b/CHANGELOG.md index 8a9142738..ad6e55005 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,12 @@ ### Added - Browser: `oracle session --harvest` and `--live` now auto-recover when the original Chrome has been closed by relaunching the manual-login profile and reopening the saved conversation URL, then retrying the harvest against the recovered tab. Resolves the failure mode where a long GPT-5 Pro Extended response completed in the background after the CLI's 20-minute wall expired and the conversation was archived. Recovery URL selection prefers `browser.harvest.url` over `browser.runtime.tabUrl` and is gated by a shared ChatGPT-conversation-URL check (rejects home, project shell, and external URLs so the persistent profile can't be navigated to the wrong page from stale metadata). Opt out with `--no-recover` on the `session` subcommand. +- MCP: add a dedicated `chatgpt_image` tool plus `generateImage` / `outputPath` support in `consult` so agent callers can trigger the ChatGPT image-aware wait/download path used by CLI `--generate-image`; saved image artifacts now come back in `structuredContent.images`. The `chatgpt_image` output reuses the typed `consult` output contract (`images` / `artifacts` / `resolved`) and its default output path carries a random suffix so concurrent agent calls cannot collide. + +### Security + +- MCP: constrain agent-supplied `generateImage` / `outputPath` to the Oracle home directory (`ORACLE_HOME_DIR`) by default so an MCP caller cannot write generated images or saved responses to arbitrary host paths. `..` traversal is rejected, and the boundary check resolves symlinks in the existing path prefix (via `realpath`) so a symlinked parent under the Oracle home cannot smuggle a write outside it. Set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to opt into external output paths as an explicit decision. CLI `--generate-image` / `--output` are unaffected. +- MCP: `chatgpt_image` / `consult` image output fails closed when a remote browser service is configured (`ORACLE_REMOTE_HOST`), since the remote executor does not transfer image artifacts back and the `structuredContent.images` contract could not be fulfilled. The run is rejected with a clear error instead of silently returning no images. ## 0.13.0 — 2026-05-22 diff --git a/README.md b/README.md index 8bd3827db..8112c95ef 100644 --- a/README.md +++ b/README.md @@ -125,6 +125,7 @@ Engine auto-picks API when `OPENAI_API_KEY` is set, otherwise browser; browser i - Browser support: stable on macOS; works on Linux (add `--browser-chrome-path/--browser-cookie-path` when needed) and Windows (manual-login or inline cookies recommended when app-bound cookies block decryption). - Remote browser service: `oracle serve` on a signed-in host; clients use `--remote-host/--remote-token`. - Browser artifacts: browser sessions save `transcript.md` and generated artifacts under `~/.oracle/sessions//artifacts/`. Deep Research saves `deep-research-report.md` when the report surface is captured; ChatGPT-generated images are downloaded with the active browser cookies when image URLs are present. +- MCP image agents: use the `chatgpt_image` tool for the easiest path, or pass `generateImage: "/path/out.png"` to `consult` with `engine: "browser"`; saved paths come back in `structuredContent.images`. - Browser archiving: by default, successful non-project, non-Deep-Research, non-multi-turn ChatGPT one-shots are archived after local artifacts are saved. Use `--browser-archive never` to disable or `--browser-archive always` to force archiving after a successful browser run. Archived chats remain manageable in ChatGPT. - Conversation mode guidance: use one-shot browser runs for narrow bug reports or quick file-set reviews; use explicit browser follow-ups for ambiguous architecture/product tradeoffs where a challenge pass and final decision are valuable; use Deep Research for broad public-web questions that need citations. Oracle never invents follow-ups automatically. - Project Sources: `oracle project-sources list|add --chatgpt-url ` manages the Project Sources tab in ChatGPT browser mode. v1 is append-only (`list`, `add`, `--dry-run`) so agents can share explicit project context without deleting or replacing user sources. diff --git a/docs/browser-mode.md b/docs/browser-mode.md index 05c7b7ab4..949a3b817 100644 --- a/docs/browser-mode.md +++ b/docs/browser-mode.md @@ -238,6 +238,8 @@ oracle --engine browser \ If ChatGPT returns multiple images, the first image saves to the requested path and the rest save as numbered siblings. Without `--generate-image`, Oracle writes images to the session `artifacts/` directory. +MCP agents should prefer the `chatgpt_image` tool. It wraps the same behavior with a smaller input shape, uploads reference files by default, and returns saved files in `structuredContent.images`. Advanced callers can still pass `generateImage` to `consult` directly. + ### Manual login mode (persistent profile, no cookie copy) Use `--browser-manual-login` when cookie decrypt is blocked (e.g., Windows app-bound cookies) or you prefer to sign in explicitly. You can also make it the default via `browser.manualLogin` in `~/.oracle/config.json`. diff --git a/docs/mcp.md b/docs/mcp.md index eed65ff55..ce653638a 100644 --- a/docs/mcp.md +++ b/docs/mcp.md @@ -8,22 +8,54 @@ Claude Code can call `oracle-mcp` and ask a subscription-backed ChatGPT browser ## Tools +### `chatgpt_image` + +- Inputs: `prompt` (required), `files?: string[]` for reference images/assets, `outputPath?: string`, `aspectRatio?: string`, `model?: string`, plus browser controls such as `browserThinkingTime`, `browserModelLabel`, `browserModelStrategy`, `browserArchive`, `browserKeepBrowser`, and `dryRun`. +- Behavior: convenience wrapper for ChatGPT browser image generation. It forces `engine:"browser"`, sets `generateImage` for the existing image-aware wait/download path, and defaults `browserAttachments:"always"` when files are provided so reference images are uploaded instead of pasted. +- Output: returns the normal session metadata plus `requestedOutputPath` and `structuredContent.images[]` with saved image paths and ChatGPT file metadata when available. If `outputPath` is omitted, Oracle picks a unique file under `ORACLE_HOME_DIR/generated/`. +- Output path safety: agent-supplied `outputPath` must resolve under `ORACLE_HOME_DIR` by default; paths outside it (`..` traversal, and symlinked parents that escape the home — resolved via `realpath`) are rejected. Set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to allow writing elsewhere as an explicit decision. Omit `outputPath` to use the safe default. +- Local browser only: image output is unsupported when a remote browser service is configured (`ORACLE_REMOTE_HOST`); the image would be written on the remote host and not transferred back, so `chatgpt_image`/`consult` image runs fail closed with a clear error rather than returning empty `structuredContent.images`. Run on the local browser to generate images. + +```json +{ + "prompt": "Create a 9:16 App Store screenshot background for a focus timer.", + "files": ["./reference-screen.png"], + "aspectRatio": "9:16" +} +``` + ### `consult` - Inputs: `prompt` (required), `files?: string[]` (globs), `model?: string` (defaults to CLI), `engine?: "api" | "browser"` (optional; Oracle follows CLI defaults: `ORACLE_ENGINE` and the effective config first, then API when `OPENAI_API_KEY` is set, otherwise browser), `slug?: string`. - Presets: `preset?: "chatgpt-pro-heavy"` applies browser mode + current Pro model alias + extended thinking, unless the request overrides those fields. -- Browser-only extras: `browserAttachments?: "auto"|"never"|"always"`, `browserBundleFiles?: boolean`, `browserBundleFormat?: "text"|"zip"`, `browserThinkingTime?: "light"|"standard"|"extended"|"heavy"`, `browserResearchMode?: "deep"`, `browserFollowUps?: string[]`, `browserArchive?: "auto"|"always"|"never"`, `browserKeepBrowser?: boolean`, `browserModelLabel?: string`, `browserModelStrategy?: "select"|"current"|"ignore"`. +- Browser-only extras: `browserAttachments?: "auto"|"never"|"always"`, `browserBundleFiles?: boolean`, `browserBundleFormat?: "text"|"zip"`, `browserThinkingTime?: "light"|"standard"|"extended"|"heavy"`, `browserResearchMode?: "deep"`, `browserFollowUps?: string[]`, `browserArchive?: "auto"|"always"|"never"`, `browserKeepBrowser?: boolean`, `browserModelLabel?: string`, `browserModelStrategy?: "select"|"current"|"ignore"`, `generateImage?: string`, `outputPath?: string`. - Dry runs: set `dryRun: true` to preview the resolved request without creating a session or touching the browser. - Behavior: starts a session, runs it with the chosen engine, returns final output + metadata. Background/foreground follows the CLI (e.g., GPT‑5 Pro detaches by default). If API mode fails because `OPENAI_API_KEY` is missing and you have ChatGPT Pro, retry with `engine: "browser"` or `preset: "chatgpt-pro-heavy"` to use your signed-in ChatGPT session instead of an API key. - Logging: emits MCP logs (`info` per line, `debug` for streamed chunks with byte sizes). If browser prerequisites are missing, returns an error payload instead of running. - Research mode: set `browserResearchMode:"deep"` for broad public-web research and cited reports. Use normal browser runs with `gpt-5.5-pro` + `browserThinkingTime:"extended"` for Pro Extended code review, or `gpt-5.5` + `browserThinkingTime:"heavy"` when you explicitly want Thinking Heavy. - Multi-turn consults: set `browserFollowUps:["Challenge your recommendation", "Give the final decision"]` to keep one ChatGPT browser conversation open and ask sequential follow-up prompts. Use one-shot calls for narrow bugs and exact file-set reviews; use multi-turn for ambiguous architecture/product decisions where a challenge pass and final recommendation are useful; use Deep Research for broad public-web work with citations. Oracle never invents follow-ups automatically. - Archiving: set `browserArchive:"auto"|"always"|"never"` to control ChatGPT conversation cleanup. `auto` archives only successful browser one-shots after local artifacts are saved, and skips project, Deep Research, multi-turn, failed, and incomplete sessions. +- ChatGPT image generation: set `engine:"browser"` and `generateImage` to a path under `ORACLE_HOME_DIR` to use the same image-aware wait/download path as CLI `--generate-image`. Saved files are returned in `structuredContent.images` and recorded as session artifacts; multiple images save as numbered siblings. Agent-supplied `generateImage` / `outputPath` are constrained to `ORACLE_HOME_DIR` by default (set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to allow external paths). #### Long browser consults from agents Browser-backed GPT-5.5 Pro consults can legitimately run for many minutes. Some MCP clients show little progress while a tool call is active, so agents should treat a long Oracle call as a running browser job, not as a failed step. Start with `dryRun:true` when configuring a new agent, prefer `preset:"chatgpt-pro-heavy"` or `engine:"browser"` explicitly, and use the shared session store (`sessions`, `oracle status`, or `oracle session `) before retrying a prompt. If the browser control plan says Oracle will launch visible Chrome, use attach/remote Chrome when the operator is actively using the computer. +#### ChatGPT images from agents + +For generated images, pass an explicit `generateImage` path. That opt-in is important because it switches the browser wait loop to watch for ChatGPT image artifacts instead of only assistant text. The path must resolve under `ORACLE_HOME_DIR` unless `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` is set. + +```json +{ + "engine": "browser", + "model": "gpt-5.5-pro", + "prompt": "Create a 9:16 App Store screenshot background for a focus timer.", + "generateImage": "${ORACLE_HOME_DIR}/generated/focus-timer-bg.png" +} +``` + +The MCP response includes `structuredContent.images[]` with the saved file path, MIME type, size, and ChatGPT file metadata when available. + ### `sessions` - Inputs: `{id?, hours?, limit?, includeAll?, detail?}` mirroring `oracle status` / `oracle session`. diff --git a/src/browser/chatgptImages.ts b/src/browser/chatgptImages.ts index 2503fa1aa..eb0a46e60 100644 --- a/src/browser/chatgptImages.ts +++ b/src/browser/chatgptImages.ts @@ -1,5 +1,6 @@ import fs from "node:fs/promises"; import path from "node:path"; +import { randomUUID } from "node:crypto"; import type { BrowserGeneratedImage, BrowserLogger, @@ -203,9 +204,10 @@ function resolveDefaultGeneratedImagePath( sessionId?: string, ): string { const primary = images[0]; - const stemSource = - primary?.fileId || primary?.alt || primary?.url || `generated-${Date.now().toString(36)}`; - const stem = sanitizeGeneratedImageStem(stemSource) || `generated-${Date.now().toString(36)}`; + // Random fallback token keeps concurrent session-less saves from colliding. + const uniqueFallback = `generated-${Date.now().toString(36)}-${randomUUID().slice(0, 8)}`; + const stemSource = primary?.fileId || primary?.alt || primary?.url || uniqueFallback; + const stem = sanitizeGeneratedImageStem(stemSource) || uniqueFallback; const baseDir = sessionId ? resolveSessionArtifactsDir(sessionId) : path.join(getOracleHomeDir(), ".temp"); diff --git a/src/mcp/server.ts b/src/mcp/server.ts index 743bb4f2b..2a3431258 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -5,6 +5,7 @@ import { pathToFileURL } from "node:url"; import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { getCliVersion } from "../version.js"; +import { registerChatGptImageTool } from "./tools/chatgptImage.js"; import { registerConsultTool } from "./tools/consult.js"; import { registerProjectSourcesTool } from "./tools/projectSources.js"; import { registerSessionsTool } from "./tools/sessions.js"; @@ -24,6 +25,7 @@ export async function startMcpServer(): Promise { ); registerConsultTool(server); + registerChatGptImageTool(server); registerProjectSourcesTool(server); registerSessionsTool(server); registerSessionResources(server); diff --git a/src/mcp/tools/chatgptImage.ts b/src/mcp/tools/chatgptImage.ts new file mode 100644 index 000000000..e9343f9a3 --- /dev/null +++ b/src/mcp/tools/chatgptImage.ts @@ -0,0 +1,142 @@ +import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; +import type { CallToolResult } from "@modelcontextprotocol/sdk/types.js"; +import path from "node:path"; +import { randomUUID } from "node:crypto"; +import { z } from "zod"; +import { getOracleHomeDir } from "../../oracleHome.js"; +import type { ConsultInput } from "../types.js"; +import { consultOutputShape, runConsultTool } from "./consult.js"; + +const chatGptImageInputShape = { + prompt: z.string().min(1, "Prompt is required.").describe("Image generation prompt."), + files: z + .array(z.string()) + .default([]) + .describe("Optional reference image/file paths or globs to upload to ChatGPT."), + outputPath: z + .string() + .optional() + .describe( + "Where to save the first generated image. Defaults to a unique file under ORACLE_HOME_DIR/generated/.", + ), + aspectRatio: z + .string() + .optional() + .describe('Optional requested image aspect ratio, e.g. "1:1", "9:16", or "16:9".'), + model: z + .string() + .optional() + .describe("Optional ChatGPT/browser model label or alias. Defaults follow Oracle config."), + browserModelLabel: z.string().optional().describe("Explicit ChatGPT UI model label to select."), + browserAttachments: z + .enum(["auto", "never", "always"]) + .optional() + .describe( + 'How to deliver files. Defaults to "always" when files are present so reference images are uploaded.', + ), + browserThinkingTime: z + .enum(["light", "standard", "extended", "heavy"]) + .optional() + .describe("Set ChatGPT thinking time when supported by the chosen model."), + browserModelStrategy: z + .enum(["select", "current", "ignore"]) + .optional() + .describe("Model picker strategy. Mirrors the consult tool and CLI browser flag."), + browserArchive: z + .enum(["auto", "always", "never"]) + .optional() + .describe("Archive completed ChatGPT conversations after local artifacts are saved."), + browserKeepBrowser: z + .boolean() + .optional() + .describe("Keep Chrome running after completion for debugging."), + dryRun: z + .boolean() + .optional() + .describe("Preview the resolved image run without touching the browser."), + slug: z.string().optional().describe("Optional human-friendly session id."), +} satisfies z.ZodRawShape; + +const chatGptImageOutputShape = { + // Mirror the consult output contract so structuredContent stays consistent + // (images/artifacts/resolved are typed by the shared consult shapes), plus the + // image-specific echo of the requested path. + ...consultOutputShape, + requestedOutputPath: z.string(), +} satisfies z.ZodRawShape; + +const chatGptImageInputSchema = z.object(chatGptImageInputShape).strict(); + +export type ChatGptImageInput = z.infer; + +function resolveDefaultImageOutputPath(): string { + // Include a random token so concurrent agent calls in the same millisecond do + // not resolve to the same default path and overwrite each other. + const unique = `${Date.now().toString(36)}-${randomUUID().slice(0, 8)}`; + return path.join(getOracleHomeDir(), "generated", `chatgpt-image-${unique}.png`); +} + +function appendAspectRatio(prompt: string, aspectRatio?: string): string { + const requestedAspectRatio = aspectRatio?.trim(); + if (!requestedAspectRatio) { + return prompt.trim(); + } + return `${prompt.trim()}\n\nCreate the image with aspect ratio ${requestedAspectRatio}.`; +} + +export function buildChatGptImageConsultInput(input: ChatGptImageInput): ConsultInput { + const files = input.files ?? []; + const outputPath = input.outputPath?.trim() || resolveDefaultImageOutputPath(); + const browserAttachments = + input.browserAttachments ?? (files.length > 0 ? ("always" as const) : undefined); + return { + prompt: appendAspectRatio(input.prompt, input.aspectRatio), + files, + model: input.model, + engine: "browser", + browserModelLabel: input.browserModelLabel, + browserAttachments, + browserThinkingTime: input.browserThinkingTime, + browserModelStrategy: input.browserModelStrategy, + browserArchive: input.browserArchive, + browserKeepBrowser: input.browserKeepBrowser, + generateImage: outputPath, + dryRun: input.dryRun, + slug: input.slug, + }; +} + +export function registerChatGptImageTool(server: McpServer): void { + server.registerTool( + "chatgpt_image", + { + title: "Generate an image with ChatGPT", + description: + "Agent-friendly wrapper for ChatGPT browser image generation. It selects browser mode, enables the image-aware wait/download path, uploads reference files when provided, and returns saved image paths in structuredContent.images.", + inputSchema: chatGptImageInputShape, + outputSchema: chatGptImageOutputShape, + }, + async (input: unknown): Promise => { + const textContent = (text: string) => [{ type: "text" as const, text }]; + let parsed; + try { + parsed = chatGptImageInputSchema.parse(input); + } catch (error) { + return { + isError: true, + content: textContent(error instanceof Error ? error.message : String(error)), + }; + } + const consultInput = buildChatGptImageConsultInput(parsed); + const result = await runConsultTool(consultInput, { server: server.server }); + const structuredContent = { + ...(result.structuredContent ?? {}), + requestedOutputPath: consultInput.generateImage, + }; + return { + ...result, + structuredContent, + }; + }, + ); +} diff --git a/src/mcp/tools/consult.ts b/src/mcp/tools/consult.ts index 1f93be7d5..a3260c37b 100644 --- a/src/mcp/tools/consult.ts +++ b/src/mcp/tools/consult.ts @@ -1,9 +1,14 @@ import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { z } from "zod"; import { getCliVersion } from "../../version.js"; -import { LoggingMessageNotificationParamsSchema } from "@modelcontextprotocol/sdk/types.js"; +import { + LoggingMessageNotificationParamsSchema, + type CallToolResult, +} from "@modelcontextprotocol/sdk/types.js"; import { ensureBrowserAvailable, mapConsultToRunOptions } from "../utils.js"; -import type { BrowserSessionConfig, SessionModelRun } from "../../sessionStore.js"; +import type { RunOracleOptions } from "../../oracle.js"; +import type { EngineMode } from "../../cli/engine.js"; +import type { BrowserSessionConfig, SessionArtifact, SessionModelRun } from "../../sessionStore.js"; import { sessionStore } from "../../sessionStore.js"; import { resolveRemoteServiceConfig } from "../../remote/remoteServiceConfig.js"; import { createRemoteBrowserExecutor } from "../../remote/client.js"; @@ -113,6 +118,18 @@ const consultInputShape = { .boolean() .optional() .describe("Browser-only: keep Chrome running after completion (useful for debugging)."), + generateImage: z + .string() + .optional() + .describe( + "Browser-only: save generated image(s) to this file path. For ChatGPT browser mode this enables the image-aware wait/download path used by CLI --generate-image.", + ), + outputPath: z + .string() + .optional() + .describe( + "Browser-only image output fallback path, mirroring the CLI --output option for image operations.", + ), dryRun: z .boolean() .optional() @@ -159,6 +176,25 @@ const consultModelSummaryShape = z.object({ logPath: z.string().optional(), }); +const consultArtifactSummaryShape = z.object({ + kind: z.string(), + path: z.string(), + label: z.string().optional(), + mimeType: z.string().optional(), + sizeBytes: z.number().optional(), + sourceUrl: z.string().optional(), +}); + +const consultImageSummaryShape = consultArtifactSummaryShape.extend({ + kind: z.literal("image"), + url: z.string().optional(), + finalUrl: z.string().optional(), + alt: z.string().optional(), + width: z.number().optional(), + height: z.number().optional(), + fileId: z.string().optional(), +}); + const consultDryRunResolvedShape = z.object({ resolvedEngine: z.enum(["api", "browser"]), model: z.string(), @@ -178,21 +214,26 @@ const consultDryRunResolvedShape = z.object({ manualLogin: z.boolean().optional(), profileDir: z.string().nullable().optional(), chatgptUrl: z.string().nullable().optional(), + imageOutputPath: z.string().nullable().optional(), }) .optional(), guidance: z.array(z.string()), }); -const consultOutputShape = { +export const consultOutputShape = { sessionId: z.string().optional(), status: z.string(), output: z.string(), dryRun: z.boolean().optional(), resolved: consultDryRunResolvedShape.optional(), models: z.array(consultModelSummaryShape).optional(), + artifacts: z.array(consultArtifactSummaryShape).optional(), + images: z.array(consultImageSummaryShape).optional(), } satisfies z.ZodRawShape; export type ConsultModelSummary = z.infer; +export type ConsultArtifactSummary = z.infer; +export type ConsultImageSummary = z.infer; export type ConsultDryRunResolved = z.infer; export function summarizeModelRunsForConsult( @@ -228,6 +269,55 @@ export function summarizeModelRunsForConsult( }); } +function optionalString(value: unknown): string | undefined { + return typeof value === "string" && value.length > 0 ? value : undefined; +} + +function optionalNumber(value: unknown): number | undefined { + return typeof value === "number" && Number.isFinite(value) ? value : undefined; +} + +export function summarizeArtifactsForConsult( + artifacts?: SessionArtifact[] | null, +): ConsultArtifactSummary[] | undefined { + if (!artifacts || artifacts.length === 0) { + return undefined; + } + return artifacts.map((artifact) => ({ + kind: artifact.kind, + path: artifact.path, + label: artifact.label, + mimeType: artifact.mimeType, + sizeBytes: artifact.sizeBytes, + sourceUrl: artifact.sourceUrl, + })); +} + +export function summarizeImageArtifactsForConsult( + artifacts?: SessionArtifact[] | null, +): ConsultImageSummary[] | undefined { + const images = (artifacts ?? []) + .filter((artifact) => artifact.kind === "image") + .map((artifact) => { + const image = artifact as SessionArtifact & Record; + return { + kind: "image" as const, + path: artifact.path, + label: artifact.label, + mimeType: artifact.mimeType, + sizeBytes: artifact.sizeBytes, + sourceUrl: artifact.sourceUrl, + url: optionalString(image.url), + finalUrl: optionalString(image.finalUrl), + alt: optionalString(image.alt), + width: optionalNumber(image.width), + height: optionalNumber(image.height), + fileId: optionalString(image.fileId), + }; + }); + return images.length > 0 ? images : undefined; +} + export function buildConsultBrowserConfig({ userConfig, env, @@ -335,6 +425,12 @@ export function buildConsultDryRunResolved({ "This is a multi-turn browser consult; all follow-ups stay in one ChatGPT conversation.", ); } + const imageOutputPath = runOptions.generateImage ?? runOptions.outputPath ?? null; + if (resolvedEngine === "browser" && imageOutputPath) { + guidance.push( + "ChatGPT generated images will use the image-aware wait/download path and return saved files in structuredContent.images.", + ); + } return { resolvedEngine, model: runOptions.model, @@ -355,6 +451,7 @@ export function buildConsultDryRunResolved({ manualLogin: browserConfig?.manualLogin, profileDir: browserConfig?.manualLoginProfileDir ?? null, chatgptUrl, + imageOutputPath, } : undefined, guidance, @@ -387,6 +484,9 @@ export function formatConsultDryRunResolved(details: ConsultDryRunResolved): str if (details.browser.chatgptUrl) { lines.push(` ChatGPT URL: ${details.browser.chatgptUrl}`); } + if (details.browser.imageOutputPath) { + lines.push(` image output: ${details.browser.imageOutputPath}`); + } } lines.push(` follow-ups: ${details.followUpCount}`); for (const guidance of details.guidance) { @@ -395,235 +495,273 @@ export function formatConsultDryRunResolved(details: ConsultDryRunResolved): str return lines; } -export function registerConsultTool(server: McpServer): void { - server.registerTool( - "consult", - { - title: "Run an oracle session", - description: - 'Run an Oracle session (API or ChatGPT browser automation). Use `files` to attach project context. If `engine` is omitted, Oracle follows CLI defaults: config/ORACLE_ENGINE first, then API when OPENAI_API_KEY is set, otherwise browser. Browser GPT-5.5 Pro consults can take many minutes; use `dryRun:true` first when configuring an agent and inspect `sessions`/`oracle status` before retrying. Browser manual-login uses a private Oracle Chrome profile separate from the user\'s normal Chrome; dry-run output includes first-time setup guidance when that path is active. For browser-based image/file uploads, set `browserAttachments:"always"`. Browser consults can include `browserFollowUps` for a multi-turn ChatGPT review in one conversation. Sessions are stored under `ORACLE_HOME_DIR` (shared with the CLI).', - // Cast to any to satisfy SDK typings across differing Zod versions. - inputSchema: consultInputShape, - outputSchema: consultOutputShape, - }, - async (input: unknown) => { - const textContent = (text: string) => [{ type: "text" as const, text }]; - let parsedInput; - try { - parsedInput = applyConsultPreset(consultInputSchema.parse(input)); - } catch (error) { - return { - isError: true, - content: textContent(error instanceof Error ? error.message : String(error)), - }; - } - const { - prompt, - files, - model, - models, - engine, - search, - browserModelLabel, - browserAttachments, - browserBundleFiles, - browserBundleFormat, - browserThinkingTime, - browserModelStrategy, - browserResearchMode, - browserArchive, - browserFollowUps, - browserKeepBrowser, - dryRun, - slug, - } = parsedInput; - const { config: userConfig } = await loadUserConfig(); - const { runOptions, resolvedEngine } = mapConsultToRunOptions({ - prompt, - files: files ?? [], - model, - models, - engine, - search, - browserAttachments, - browserBundleFiles, - browserBundleFormat, - browserFollowUps, - userConfig, - env: process.env, - }); - const cwd = process.cwd(); - const sendLog = (text: string, level: "info" | "debug" = "info") => - server.server - .sendLoggingMessage( - LoggingMessageNotificationParamsSchema.parse({ - level, - data: { text, bytes: Buffer.byteLength(text, "utf8") }, - }), - ) - .catch(() => {}); - - const resolvedRemote = resolveRemoteServiceConfig({ userConfig, env: process.env }); - - let browserConfig: BrowserSessionConfig | undefined; - if (resolvedEngine === "browser") { - browserConfig = buildConsultBrowserConfig({ - userConfig, - env: process.env, - runModel: runOptions.model, - inputModel: model, - browserModelLabel, - browserThinkingTime, - browserModelStrategy, - browserResearchMode, - browserArchive, - browserKeepBrowser, - }); - } +type McpLoggingServer = Pick; - if (dryRun) { - const lines: string[] = []; - const log = (line: string): void => { - lines.push(line); - sendLog(line); - }; - const resolved = buildConsultDryRunResolved({ - resolvedEngine, - runOptions, - browserConfig, - }); - await runDryRunSummary({ - engine: resolvedEngine, - runOptions, - cwd, - version: getCliVersion(), - log, - browserConfig, - }); - for (const line of formatConsultDryRunResolved(resolved)) { - log(line); - } - const output = lines.join("\n").trim(); - return { - content: textContent(output), - structuredContent: { - status: "dry-run", - output, - dryRun: true, - resolved, - }, - }; - } +export async function runConsultTool( + input: unknown, + { server }: { server: McpLoggingServer }, +): Promise { + const textContent = (text: string) => [{ type: "text" as const, text }]; + let parsedInput; + try { + parsedInput = applyConsultPreset(consultInputSchema.parse(input)); + } catch (error) { + return { + isError: true, + content: textContent(error instanceof Error ? error.message : String(error)), + }; + } + const { + prompt, + files, + model, + models, + engine, + search, + browserModelLabel, + browserAttachments, + browserBundleFiles, + browserBundleFormat, + browserThinkingTime, + browserModelStrategy, + browserResearchMode, + browserArchive, + browserFollowUps, + browserKeepBrowser, + generateImage, + outputPath, + dryRun, + slug, + } = parsedInput; + const { config: userConfig } = await loadUserConfig(); + let runOptions: RunOracleOptions; + let resolvedEngine: EngineMode; + try { + ({ runOptions, resolvedEngine } = mapConsultToRunOptions({ + prompt, + files: files ?? [], + model, + models, + engine, + search, + browserAttachments, + browserBundleFiles, + browserBundleFormat, + browserFollowUps, + generateImage, + outputPath, + userConfig, + env: process.env, + })); + } catch (error) { + return { + isError: true, + content: textContent(error instanceof Error ? error.message : String(error)), + }; + } + const cwd = process.cwd(); + const sendLog = (text: string, level: "info" | "debug" = "info") => + server + .sendLoggingMessage( + LoggingMessageNotificationParamsSchema.parse({ + level, + data: { text, bytes: Buffer.byteLength(text, "utf8") }, + }), + ) + .catch(() => {}); - const browserGuard = ensureBrowserAvailable(resolvedEngine, { - remoteHost: resolvedRemote.host, - }); - if (resolvedEngine === "browser" && browserGuard) { - return { - isError: true, - content: textContent(browserGuard), - }; - } + const resolvedRemote = resolveRemoteServiceConfig({ userConfig, env: process.env }); - let browserDeps: BrowserSessionRunnerDeps | undefined; - if (resolvedEngine === "browser" && resolvedRemote.host) { - if (!resolvedRemote.token) { - return { - isError: true, - content: textContent( - `Remote host configured (${resolvedRemote.host}) but remote token is missing. Run \`oracle bridge client --connect <...>\` or set ORACLE_REMOTE_TOKEN.`, - ), - }; - } - browserDeps = { - executeBrowser: createRemoteBrowserExecutor({ - host: resolvedRemote.host, - token: resolvedRemote.token, - }), - }; - } + let browserConfig: BrowserSessionConfig | undefined; + if (resolvedEngine === "browser") { + browserConfig = buildConsultBrowserConfig({ + userConfig, + env: process.env, + runModel: runOptions.model, + inputModel: model, + browserModelLabel, + browserThinkingTime, + browserModelStrategy, + browserResearchMode, + browserArchive, + browserKeepBrowser, + }); + } - const notifications = resolveNotificationSettings({ - cliNotify: undefined, - cliNotifySound: undefined, - env: process.env, - config: userConfig.notify, - }); + if (dryRun) { + const lines: string[] = []; + const log = (line: string): void => { + lines.push(line); + sendLog(line); + }; + const resolved = buildConsultDryRunResolved({ + resolvedEngine, + runOptions, + browserConfig, + }); + await runDryRunSummary({ + engine: resolvedEngine, + runOptions, + cwd, + version: getCliVersion(), + log, + browserConfig, + }); + for (const line of formatConsultDryRunResolved(resolved)) { + log(line); + } + const output = lines.join("\n").trim(); + return { + content: textContent(output), + structuredContent: { + status: "dry-run", + output, + dryRun: true, + resolved, + }, + }; + } - const sessionMeta = await sessionStore.createSession( - { - ...runOptions, - mode: resolvedEngine, - slug, - browserConfig, - waitPreference: true, - }, - cwd, - notifications, - ); + const browserGuard = ensureBrowserAvailable(resolvedEngine, { + remoteHost: resolvedRemote.host, + }); + if (resolvedEngine === "browser" && browserGuard) { + return { + isError: true, + content: textContent(browserGuard), + }; + } - const logWriter = sessionStore.createLogWriter(sessionMeta.id); - // Stream logs to both the session log and MCP logging notifications, but avoid buffering in memory - const log = (line?: string): void => { - logWriter.logLine(line); - if (line !== undefined) { - sendLog(line); - } + let browserDeps: BrowserSessionRunnerDeps | undefined; + if (resolvedEngine === "browser" && resolvedRemote.host) { + if (!resolvedRemote.token) { + return { + isError: true, + content: textContent( + `Remote host configured (${resolvedRemote.host}) but remote token is missing. Run \`oracle bridge client --connect <...>\` or set ORACLE_REMOTE_TOKEN.`, + ), }; - const write = (chunk: string): boolean => { - logWriter.writeChunk(chunk); - sendLog(chunk, "debug"); - return true; + } + // Fail closed for image output over the remote browser service: the remote + // executor does not thread image artifacts back through the protocol, so the + // generated image would be written on the remote host and the promised + // structuredContent.images contract could not be fulfilled. Better to reject + // explicitly than silently return no images. + const imageOutputPath = runOptions.generateImage ?? runOptions.outputPath; + if (imageOutputPath) { + return { + isError: true, + content: textContent( + `ChatGPT image output is not supported with a remote browser service (ORACLE_REMOTE_HOST=${resolvedRemote.host}): generated images are written on the remote host and are not transferred back, so structuredContent.images cannot be returned. Unset the remote host to generate images on the local browser, or omit generateImage/outputPath.`, + ), }; + } + browserDeps = { + executeBrowser: createRemoteBrowserExecutor({ + host: resolvedRemote.host, + token: resolvedRemote.token, + }), + }; + } + + const notifications = resolveNotificationSettings({ + cliNotify: undefined, + cliNotifySound: undefined, + env: process.env, + config: userConfig.notify, + }); + + const sessionMeta = await sessionStore.createSession( + { + ...runOptions, + mode: resolvedEngine, + slug, + browserConfig, + waitPreference: true, + }, + cwd, + notifications, + ); + + const logWriter = sessionStore.createLogWriter(sessionMeta.id); + // Stream logs to both the session log and MCP logging notifications, but avoid buffering in memory + const log = (line?: string): void => { + logWriter.logLine(line); + if (line !== undefined) { + sendLog(line); + } + }; + const write = (chunk: string): boolean => { + logWriter.writeChunk(chunk); + sendLog(chunk, "debug"); + return true; + }; + + try { + await performSessionRun({ + sessionMeta, + runOptions, + mode: resolvedEngine, + browserConfig, + cwd, + log, + write, + version: getCliVersion(), + notifications, + muteStdout: true, + browserDeps, + }); + } catch (error) { + log(`Run failed: ${error instanceof Error ? error.message : String(error)}`); + return { + isError: true, + content: textContent( + `Session ${sessionMeta.id} failed: ${error instanceof Error ? error.message : String(error)}`, + ), + }; + } finally { + logWriter.stream.end(); + } - try { - await performSessionRun({ - sessionMeta, - runOptions, - mode: resolvedEngine, - browserConfig, - cwd, - log, - write, - version: getCliVersion(), - notifications, - muteStdout: true, - browserDeps, - }); - } catch (error) { - log(`Run failed: ${error instanceof Error ? error.message : String(error)}`); - return { - isError: true, - content: textContent( - `Session ${sessionMeta.id} failed: ${error instanceof Error ? error.message : String(error)}`, - ), - }; - } finally { - logWriter.stream.end(); - } + try { + const finalMeta = (await sessionStore.readSession(sessionMeta.id)) ?? sessionMeta; + const summary = `Session ${sessionMeta.id} (${finalMeta.status})`; + const logTail = await readSessionLogTail(sessionMeta.id, 4000); + const modelsSummary = summarizeModelRunsForConsult(finalMeta.models); + const artifacts = summarizeArtifactsForConsult(finalMeta.artifacts); + const images = summarizeImageArtifactsForConsult(finalMeta.artifacts); + return { + content: textContent([summary, logTail || "(log empty)"].join("\n").trim()), + structuredContent: { + sessionId: sessionMeta.id, + status: finalMeta.status, + output: logTail ?? "", + models: modelsSummary, + artifacts, + images, + }, + }; + } catch (error) { + return { + isError: true, + content: textContent( + `Session completed but metadata fetch failed: ${error instanceof Error ? error.message : String(error)}`, + ), + }; + } +} - try { - const finalMeta = (await sessionStore.readSession(sessionMeta.id)) ?? sessionMeta; - const summary = `Session ${sessionMeta.id} (${finalMeta.status})`; - const logTail = await readSessionLogTail(sessionMeta.id, 4000); - const modelsSummary = summarizeModelRunsForConsult(finalMeta.models); - return { - content: textContent([summary, logTail || "(log empty)"].join("\n").trim()), - structuredContent: { - sessionId: sessionMeta.id, - status: finalMeta.status, - output: logTail ?? "", - models: modelsSummary, - }, - }; - } catch (error) { - return { - isError: true, - content: textContent( - `Session completed but metadata fetch failed: ${error instanceof Error ? error.message : String(error)}`, - ), - }; - } +export function registerConsultTool(server: McpServer): void { + server.registerTool( + "consult", + { + title: "Run an oracle session", + description: + 'Run an Oracle session (API or ChatGPT browser automation). Use `files` to attach project context. If `engine` is omitted, Oracle follows CLI defaults: config/ORACLE_ENGINE first, then API when OPENAI_API_KEY is set, otherwise browser. Browser GPT-5.5 Pro consults can take many minutes; use `dryRun:true` first when configuring an agent and inspect `sessions`/`oracle status` before retrying. Browser manual-login uses a private Oracle Chrome profile separate from the user\'s normal Chrome; dry-run output includes first-time setup guidance when that path is active. For browser-based image/file uploads, set `browserAttachments:"always"`. For ChatGPT image generation, set `generateImage` to enable the same image wait/download path as CLI --generate-image and read returned paths from `images`. Browser consults can include `browserFollowUps` for a multi-turn ChatGPT review in one conversation. Sessions are stored under `ORACLE_HOME_DIR` (shared with the CLI).', + // Cast to any to satisfy SDK typings across differing Zod versions. + inputSchema: consultInputShape, + outputSchema: consultOutputShape, }, + async (input: unknown) => runConsultTool(input, { server: server.server }), ); } diff --git a/src/mcp/types.ts b/src/mcp/types.ts index 2ba6fc0e1..33aaeeef9 100644 --- a/src/mcp/types.ts +++ b/src/mcp/types.ts @@ -20,6 +20,8 @@ export const consultInputSchema = z browserArchive: z.enum(["auto", "always", "never"]).optional(), browserFollowUps: z.array(z.string()).optional(), browserKeepBrowser: z.boolean().optional(), + generateImage: z.string().optional(), + outputPath: z.string().optional(), dryRun: z.boolean().optional(), search: z.boolean().optional(), slug: z.string().optional(), diff --git a/src/mcp/utils.ts b/src/mcp/utils.ts index 1bd466ba0..88950b036 100644 --- a/src/mcp/utils.ts +++ b/src/mcp/utils.ts @@ -1,9 +1,87 @@ +import fs from "node:fs"; +import path from "node:path"; import type { RunOracleOptions } from "../oracle.js"; import type { EngineMode } from "../cli/engine.js"; import type { UserConfig } from "../config.js"; import { resolveRunOptionsFromConfig } from "../cli/runOptions.js"; +import { getOracleHomeDir } from "../oracleHome.js"; import { Launcher } from "chrome-launcher"; +const ALLOW_EXTERNAL_OUTPUT_ENV = "ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT"; + +/** + * Whether MCP callers may write generated images / saved responses outside the + * Oracle home directory. Off by default: MCP clients are less trusted than the + * CLI user, so an agent must not be able to write to arbitrary host paths. + */ +export function isExternalMcpOutputAllowed(env: NodeJS.ProcessEnv = process.env): boolean { + const raw = env[ALLOW_EXTERNAL_OUTPUT_ENV]?.trim().toLowerCase(); + return raw === "1" || raw === "true" || raw === "yes" || raw === "on"; +} + +function realpathOrSelf(target: string): string { + try { + return fs.realpathSync(target); + } catch { + return path.resolve(target); + } +} + +/** + * Resolve `target` through symlinks for the portion that exists on disk, then + * re-append the not-yet-created remainder. A lexical `path.resolve` is not + * enough: a symlinked parent under the Oracle home (e.g. `~/.oracle/generated` + * -> `/tmp/evil`) would pass a string-prefix check while the actual write lands + * outside the boundary. realpath-ing the deepest existing ancestor closes that. + */ +function resolveThroughSymlinks(target: string): string { + const resolved = path.resolve(target); + let current = resolved; + const tail: string[] = []; + // Bounded by the number of path segments; dirname() converges to the root. + for (;;) { + try { + const real = fs.realpathSync(current); + return tail.length ? path.join(real, ...tail.toReversed()) : real; + } catch { + const parent = path.dirname(current); + if (parent === current) { + return resolved; + } + tail.push(path.basename(current)); + current = parent; + } + } +} + +/** + * Constrain an MCP-supplied output path to the Oracle home directory and return + * its resolved absolute form. `path.resolve` collapses `..` (traversal escapes + * are rejected) and the boundary check is performed after resolving symlinks in + * the existing path prefix, so a symlinked parent cannot smuggle a write + * outside the home. Set ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT to opt into writing + * outside the Oracle home as an explicit decision. + */ +export function resolveMcpOutputPath( + requestedPath: string, + field: "generateImage" | "outputPath", + env: NodeJS.ProcessEnv = process.env, +): string { + const resolved = path.resolve(requestedPath); + if (isExternalMcpOutputAllowed(env)) { + return resolved; + } + const root = realpathOrSelf(getOracleHomeDir()); + const realTarget = resolveThroughSymlinks(resolved); + if (realTarget === root || realTarget.startsWith(`${root}${path.sep}`)) { + return resolved; + } + throw new Error( + `MCP "${field}" must resolve under the Oracle home directory (${root}); got "${realTarget}". ` + + `Use a path under that directory, or set ${ALLOW_EXTERNAL_OUTPUT_ENV}=1 to allow external output paths.`, + ); +} + export function mapConsultToRunOptions({ prompt, files, @@ -15,6 +93,8 @@ export function mapConsultToRunOptions({ browserBundleFiles, browserBundleFormat, browserFollowUps, + generateImage, + outputPath, userConfig, env = process.env, }: { @@ -28,6 +108,8 @@ export function mapConsultToRunOptions({ browserBundleFiles?: boolean; browserBundleFormat?: "text" | "zip"; browserFollowUps?: string[]; + generateImage?: string; + outputPath?: string; userConfig?: UserConfig; env?: NodeJS.ProcessEnv; }): { runOptions: RunOracleOptions; resolvedEngine: EngineMode } { @@ -63,6 +145,14 @@ export function mapConsultToRunOptions({ .map((entry) => entry.trim()) .filter(Boolean); } + const imageOutputPath = generateImage?.trim(); + if (imageOutputPath) { + result.runOptions.generateImage = resolveMcpOutputPath(imageOutputPath, "generateImage", env); + } + const secondaryOutputPath = outputPath?.trim(); + if (secondaryOutputPath) { + result.runOptions.outputPath = resolveMcpOutputPath(secondaryOutputPath, "outputPath", env); + } return result; } diff --git a/tests/mcp/chatgptImage.test.ts b/tests/mcp/chatgptImage.test.ts new file mode 100644 index 000000000..281d7a26e --- /dev/null +++ b/tests/mcp/chatgptImage.test.ts @@ -0,0 +1,105 @@ +import { mkdtempSync, rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import path from "node:path"; +import { afterEach, describe, expect, test } from "vitest"; +import { + buildChatGptImageConsultInput, + registerChatGptImageTool, +} from "../../src/mcp/tools/chatgptImage.ts"; +import { setOracleHomeDirOverrideForTest } from "../../src/oracleHome.js"; + +function registerHandler(): (input: unknown) => Promise { + const handlers: Array<(input: unknown) => Promise> = []; + registerChatGptImageTool({ + registerTool: (_name: string, _def: unknown, fn: (input: unknown) => Promise) => { + handlers.push(fn); + }, + server: { + sendLoggingMessage: async () => undefined, + }, + } as unknown as Parameters[0]); + const handler = handlers[0]; + if (!handler) throw new Error("handler not registered"); + return handler; +} + +describe("chatgpt_image MCP tool", () => { + afterEach(() => { + setOracleHomeDirOverrideForTest(null); + }); + + test("builds an image-aware browser consult with uploaded references", () => { + const input = buildChatGptImageConsultInput({ + prompt: "Create an App Store screenshot background.", + files: ["reference.png"], + outputPath: "/tmp/screenshot-bg.png", + aspectRatio: "9:16", + browserThinkingTime: "extended", + }); + + expect(input).toMatchObject({ + engine: "browser", + generateImage: "/tmp/screenshot-bg.png", + files: ["reference.png"], + browserAttachments: "always", + browserThinkingTime: "extended", + }); + expect(input.prompt).toContain("aspect ratio 9:16"); + }); + + test("uses a unique default output path when agents only provide a prompt", () => { + const first = buildChatGptImageConsultInput({ prompt: "Create a simple app icon.", files: [] }); + const second = buildChatGptImageConsultInput({ + prompt: "Create a simple app icon.", + files: [], + }); + + expect(first.engine).toBe("browser"); + expect(first.generateImage).toMatch(/generated\/chatgpt-image-[a-z0-9-]+\.png$/); + // Random suffix keeps concurrent default paths from colliding. + expect(first.generateImage).not.toBe(second.generateImage); + expect(first.browserAttachments).toBeUndefined(); + }); + + test("returns resolved dry-run details from the registered tool", async () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + const handler = registerHandler(); + const target = path.join(home, "product-mockup.png"); + const result = (await handler({ + dryRun: true, + prompt: "Create a small product mockup.", + outputPath: target, + aspectRatio: "1:1", + })) as { + structuredContent: { + requestedOutputPath: string; + resolved: { browser?: { imageOutputPath?: string } }; + }; + }; + + expect(result.structuredContent.requestedOutputPath).toBe(target); + expect(result.structuredContent.resolved.browser?.imageOutputPath).toBe(target); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test("rejects an output path outside the Oracle home", async () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + const handler = registerHandler(); + const result = (await handler({ + dryRun: true, + prompt: "Create a small product mockup.", + outputPath: "/tmp/escape.png", + })) as { isError?: boolean }; + + expect(result.isError).toBe(true); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); +}); diff --git a/tests/mcp/consult.test.ts b/tests/mcp/consult.test.ts index 776831fed..beca0ab3d 100644 --- a/tests/mcp/consult.test.ts +++ b/tests/mcp/consult.test.ts @@ -1,11 +1,17 @@ +import { mkdtempSync, rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import path from "node:path"; import { describe, expect, test } from "vitest"; import type { SessionModelRun } from "../../src/sessionStore.js"; import { applyConsultPreset } from "../../src/mcp/consultPresets.ts"; +import { setOracleHomeDirOverrideForTest } from "../../src/oracleHome.js"; import { buildConsultBrowserConfig, buildConsultDryRunResolved, formatConsultDryRunResolved, registerConsultTool, + summarizeArtifactsForConsult, + summarizeImageArtifactsForConsult, summarizeModelRunsForConsult, } from "../../src/mcp/tools/consult.ts"; @@ -78,6 +84,54 @@ describe("summarizeModelRunsForConsult", () => { expect(summarizeModelRunsForConsult(undefined)).toBeUndefined(); }); + test("surfaces saved image artifacts for agent callers", () => { + const artifacts = [ + { + kind: "image", + path: "/tmp/mockup.png", + label: "Generated image", + mimeType: "image/png", + sizeBytes: 1234, + sourceUrl: "https://chatgpt.com/backend-api/estuary/content?id=file_abc", + url: "https://chatgpt.com/backend-api/estuary/content?id=file_abc", + finalUrl: "https://files.local/mockup.png", + alt: "generated image", + width: 1024, + height: 1024, + fileId: "file_abc", + }, + { + kind: "transcript", + path: "/tmp/transcript.md", + label: "Browser transcript", + }, + ]; + + const sessionArtifacts = artifacts as Parameters[0]; + + expect(summarizeArtifactsForConsult(sessionArtifacts)).toEqual([ + expect.objectContaining({ + kind: "image", + path: "/tmp/mockup.png", + mimeType: "image/png", + }), + expect.objectContaining({ + kind: "transcript", + path: "/tmp/transcript.md", + }), + ]); + expect(summarizeImageArtifactsForConsult(sessionArtifacts)).toEqual([ + expect.objectContaining({ + kind: "image", + path: "/tmp/mockup.png", + url: "https://chatgpt.com/backend-api/estuary/content?id=file_abc", + width: 1024, + height: 1024, + fileId: "file_abc", + }), + ]); + }); + test("merges browser defaults from config for consult runs", () => { const config = buildConsultBrowserConfig({ userConfig: { @@ -171,6 +225,7 @@ describe("summarizeModelRunsForConsult", () => { browserBundleFiles: true, browserBundleFormat: "zip", browserFollowUps: ["challenge", "final"], + generateImage: "/tmp/oracle-image.png", }, browserConfig: { desiredModel: "GPT-5.5 Pro", @@ -196,63 +251,153 @@ describe("summarizeModelRunsForConsult", () => { bundleFiles: true, bundleFormat: "zip", profileDir: "/tmp/oracle-profile", + imageOutputPath: "/tmp/oracle-image.png", }, }); expect(resolved.guidance.join("\n")).toContain("signed-in ChatGPT profile"); expect(resolved.guidance.join("\n")).toContain("private Chrome profile"); expect(resolved.guidance.join("\n")).toContain("--browser-keep-browser"); + expect(resolved.guidance.join("\n")).toContain("image-aware wait/download"); expect(formatConsultDryRunResolved(resolved).join("\n")).toContain( "browser thinking time: extended", ); expect(formatConsultDryRunResolved(resolved).join("\n")).toContain( "browser bundle format: zip", ); + expect(formatConsultDryRunResolved(resolved).join("\n")).toContain( + "image output: /tmp/oracle-image.png", + ); }); test("returns resolved dry-run details from the registered MCP consult tool", async () => { - const handlers: Array<(input: unknown) => Promise> = []; - registerConsultTool({ - registerTool: (_name: string, _def: unknown, fn: (input: unknown) => Promise) => { - handlers.push(fn); - }, - server: { - sendLoggingMessage: async () => undefined, - }, - } as unknown as Parameters[0]); - const handler = handlers[0]; - if (!handler) throw new Error("handler not registered"); + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + const imagePath = path.join(home, "from-mcp.png"); + try { + const handlers: Array<(input: unknown) => Promise> = []; + registerConsultTool({ + registerTool: (_name: string, _def: unknown, fn: (input: unknown) => Promise) => { + handlers.push(fn); + }, + server: { + sendLoggingMessage: async () => undefined, + }, + } as unknown as Parameters[0]); + const handler = handlers[0]; + if (!handler) throw new Error("handler not registered"); - const result = (await handler({ - dryRun: true, - engine: "browser", - model: "gpt-5.5-pro", - prompt: "review this", - files: [], - browserThinkingTime: "extended", - browserModelStrategy: "select", - })) as { - content: Array<{ type: "text"; text: string }>; - structuredContent: { - status: string; - dryRun: boolean; - resolved: ReturnType; + const result = (await handler({ + dryRun: true, + engine: "browser", + model: "gpt-5.5-pro", + prompt: "review this", + files: [], + browserThinkingTime: "extended", + browserModelStrategy: "select", + generateImage: imagePath, + })) as { + content: Array<{ type: "text"; text: string }>; + structuredContent: { + status: string; + dryRun: boolean; + resolved: ReturnType; + }; }; - }; - expect(result.structuredContent).toMatchObject({ - status: "dry-run", - dryRun: true, - resolved: { - resolvedEngine: "browser", + expect(result.structuredContent).toMatchObject({ + status: "dry-run", + dryRun: true, + resolved: { + resolvedEngine: "browser", + model: "gpt-5.5-pro", + browser: expect.objectContaining({ + desiredModel: "Pro", + thinkingTime: "extended", + modelStrategy: "select", + imageOutputPath: imagePath, + }), + }, + }); + expect(result.content[0]?.text).toContain("[dry-run] MCP resolved request:"); + } finally { + setOracleHomeDirOverrideForTest(null); + rmSync(home, { recursive: true, force: true }); + } + }); + + test("rejects an MCP consult image path outside the Oracle home", async () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + const handlers: Array<(input: unknown) => Promise> = []; + registerConsultTool({ + registerTool: (_name: string, _def: unknown, fn: (input: unknown) => Promise) => { + handlers.push(fn); + }, + server: { + sendLoggingMessage: async () => undefined, + }, + } as unknown as Parameters[0]); + const handler = handlers[0]; + if (!handler) throw new Error("handler not registered"); + + const result = (await handler({ + dryRun: true, + engine: "browser", model: "gpt-5.5-pro", - browser: expect.objectContaining({ - desiredModel: "Pro", - thinkingTime: "extended", - modelStrategy: "select", - }), - }, - }); - expect(result.content[0]?.text).toContain("[dry-run] MCP resolved request:"); + prompt: "review this", + files: [], + generateImage: "/tmp/from-mcp.png", + })) as { isError?: boolean; content: Array<{ type: "text"; text: string }> }; + + expect(result.isError).toBe(true); + expect(result.content[0]?.text).toMatch(/Oracle home directory/); + } finally { + setOracleHomeDirOverrideForTest(null); + rmSync(home, { recursive: true, force: true }); + } + }); + + test("fails closed for image output over a remote browser service", async () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + const prevHost = process.env.ORACLE_REMOTE_HOST; + const prevToken = process.env.ORACLE_REMOTE_TOKEN; + process.env.ORACLE_REMOTE_HOST = "remote.example:8080"; + process.env.ORACLE_REMOTE_TOKEN = "remote-token"; + try { + const handlers: Array<(input: unknown) => Promise> = []; + registerConsultTool({ + registerTool: (_name: string, _def: unknown, fn: (input: unknown) => Promise) => { + handlers.push(fn); + }, + server: { sendLoggingMessage: async () => undefined }, + } as unknown as Parameters[0]); + const handler = handlers[0]; + if (!handler) throw new Error("handler not registered"); + + const result = (await handler({ + engine: "browser", + model: "gpt-5.5", + prompt: "make an image", + files: [], + // Path under the Oracle home so containment passes and we reach the + // remote guard rather than the path check. + generateImage: path.join(home, "generated", "img.png"), + })) as { isError?: boolean; content: Array<{ type: "text"; text: string }> }; + + expect(result.isError).toBe(true); + expect(result.content[0]?.text).toMatch( + /image output is not supported with a remote browser/i, + ); + } finally { + if (prevHost === undefined) delete process.env.ORACLE_REMOTE_HOST; + else process.env.ORACLE_REMOTE_HOST = prevHost; + if (prevToken === undefined) delete process.env.ORACLE_REMOTE_TOKEN; + else process.env.ORACLE_REMOTE_TOKEN = prevToken; + setOracleHomeDirOverrideForTest(null); + rmSync(home, { recursive: true, force: true }); + } }); test("rejects unsupported consult fields instead of silently ignoring them", async () => { diff --git a/tests/mcp/utils.test.ts b/tests/mcp/utils.test.ts index 12f49b25d..82b501bb9 100644 --- a/tests/mcp/utils.test.ts +++ b/tests/mcp/utils.test.ts @@ -1,7 +1,15 @@ -import { describe, expect, test } from "vitest"; +import { mkdirSync, mkdtempSync, rmSync, symlinkSync } from "node:fs"; +import { tmpdir } from "node:os"; +import path from "node:path"; +import { afterEach, describe, expect, test } from "vitest"; import { mapConsultToRunOptions } from "../../src/mcp/utils.js"; +import { setOracleHomeDirOverrideForTest } from "../../src/oracleHome.js"; describe("mapConsultToRunOptions", () => { + afterEach(() => { + setOracleHomeDirOverrideForTest(null); + }); + test("passes multi-model selections through to run options", () => { const env: NodeJS.ProcessEnv = {}; env.OPENAI_API_KEY = "sk-test"; @@ -35,4 +43,179 @@ describe("mapConsultToRunOptions", () => { "final concise decision", ]); }); + + test("maps external ChatGPT image output paths when external output is allowed", () => { + const env: NodeJS.ProcessEnv = { ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT: "1" }; + const { runOptions, resolvedEngine } = mapConsultToRunOptions({ + prompt: "generate a product mockup", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: " /tmp/mockup.png ", + outputPath: " /tmp/fallback.png ", + userConfig: undefined, + env, + }); + + expect(resolvedEngine).toBe("browser"); + expect(runOptions.generateImage).toBe(path.resolve("/tmp/mockup.png")); + expect(runOptions.outputPath).toBe(path.resolve("/tmp/fallback.png")); + }); + + test("rejects MCP output paths outside the Oracle home by default", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + expect(() => + mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: "/tmp/escape.png", + userConfig: undefined, + env: {}, + }), + ).toThrow(/Oracle home directory/); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test("rejects traversal escapes from the Oracle home", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + expect(() => + mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: path.join(home, "..", "escape.png"), + userConfig: undefined, + env: {}, + }), + ).toThrow(/Oracle home directory/); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test("allows MCP output paths under the Oracle home without opt-in", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + const target = path.join(home, "generated", "img.png"); + const { runOptions } = mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: target, + userConfig: undefined, + env: {}, + }); + expect(runOptions.generateImage).toBe(target); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test("rejects a symlinked parent that escapes the Oracle home (generateImage)", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + const outside = mkdtempSync(path.join(tmpdir(), "oracle-outside-")); + setOracleHomeDirOverrideForTest(home); + try { + // ORACLE_HOME/escape -> /outside (a symlinked dir leaving the boundary). + // The target is lexically under home but really writes into `outside`. + symlinkSync(outside, path.join(home, "escape")); + const target = path.join(home, "escape", "img.png"); + expect(() => + mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: target, + userConfig: undefined, + env: {}, + }), + ).toThrow(/Oracle home directory/); + } finally { + rmSync(home, { recursive: true, force: true }); + rmSync(outside, { recursive: true, force: true }); + } + }); + + test("rejects a symlinked parent that escapes the Oracle home (outputPath)", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + const outside = mkdtempSync(path.join(tmpdir(), "oracle-outside-")); + setOracleHomeDirOverrideForTest(home); + try { + symlinkSync(outside, path.join(home, "escape")); + const target = path.join(home, "escape", "answer.md"); + expect(() => + mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + outputPath: target, + userConfig: undefined, + env: {}, + }), + ).toThrow(/Oracle home directory/); + } finally { + rmSync(home, { recursive: true, force: true }); + rmSync(outside, { recursive: true, force: true }); + } + }); + + test("allows a symlinked parent that stays within the Oracle home", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + setOracleHomeDirOverrideForTest(home); + try { + // ORACLE_HOME/real exists; ORACLE_HOME/link -> ORACLE_HOME/real (still inside). + const realDir = path.join(home, "real"); + mkdirSync(realDir); + symlinkSync(realDir, path.join(home, "link")); + const target = path.join(home, "link", "img.png"); + const { runOptions } = mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: target, + userConfig: undefined, + env: {}, + }); + expect(runOptions.generateImage).toBe(target); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test("allows a symlink escape when external output is explicitly enabled", () => { + const home = mkdtempSync(path.join(tmpdir(), "oracle-home-")); + const outside = mkdtempSync(path.join(tmpdir(), "oracle-outside-")); + setOracleHomeDirOverrideForTest(home); + try { + symlinkSync(outside, path.join(home, "escape")); + const target = path.join(home, "escape", "img.png"); + const { runOptions } = mapConsultToRunOptions({ + prompt: "x", + files: [], + model: "gpt-5.5-pro", + engine: "browser", + generateImage: target, + userConfig: undefined, + env: { ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT: "1" }, + }); + expect(runOptions.generateImage).toBe(path.resolve(target)); + } finally { + rmSync(home, { recursive: true, force: true }); + rmSync(outside, { recursive: true, force: true }); + } + }); });