Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
### Added

- Browser: `oracle session <id> --harvest` and `--live` now auto-recover when the original Chrome has been closed by relaunching the manual-login profile and reopening the saved conversation URL, then retrying the harvest against the recovered tab. Resolves the failure mode where a long GPT-5 Pro Extended response completed in the background after the CLI's 20-minute wall expired and the conversation was archived. Recovery URL selection prefers `browser.harvest.url` over `browser.runtime.tabUrl` and is gated by a shared ChatGPT-conversation-URL check (rejects home, project shell, and external URLs so the persistent profile can't be navigated to the wrong page from stale metadata). Opt out with `--no-recover` on the `session` subcommand.
- MCP: add a dedicated `chatgpt_image` tool plus `generateImage` / `outputPath` support in `consult` so agent callers can trigger the ChatGPT image-aware wait/download path used by CLI `--generate-image`; saved image artifacts now come back in `structuredContent.images`. The `chatgpt_image` output reuses the typed `consult` output contract (`images` / `artifacts` / `resolved`) and its default output path carries a random suffix so concurrent agent calls cannot collide.

### Security

- MCP: constrain agent-supplied `generateImage` / `outputPath` to the Oracle home directory (`ORACLE_HOME_DIR`) by default so an MCP caller cannot write generated images or saved responses to arbitrary host paths. `..` traversal is rejected, and the boundary check resolves symlinks in the existing path prefix (via `realpath`) so a symlinked parent under the Oracle home cannot smuggle a write outside it. Set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to opt into external output paths as an explicit decision. CLI `--generate-image` / `--output` are unaffected.
- MCP: `chatgpt_image` / `consult` image output fails closed when a remote browser service is configured (`ORACLE_REMOTE_HOST`), since the remote executor does not transfer image artifacts back and the `structuredContent.images` contract could not be fulfilled. The run is rejected with a clear error instead of silently returning no images.

## 0.13.0 — 2026-05-22

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ Engine auto-picks API when `OPENAI_API_KEY` is set, otherwise browser; browser i
- Browser support: stable on macOS; works on Linux (add `--browser-chrome-path/--browser-cookie-path` when needed) and Windows (manual-login or inline cookies recommended when app-bound cookies block decryption).
- Remote browser service: `oracle serve` on a signed-in host; clients use `--remote-host/--remote-token`.
- Browser artifacts: browser sessions save `transcript.md` and generated artifacts under `~/.oracle/sessions/<id>/artifacts/`. Deep Research saves `deep-research-report.md` when the report surface is captured; ChatGPT-generated images are downloaded with the active browser cookies when image URLs are present.
- MCP image agents: use the `chatgpt_image` tool for the easiest path, or pass `generateImage: "/path/out.png"` to `consult` with `engine: "browser"`; saved paths come back in `structuredContent.images`.
- Browser archiving: by default, successful non-project, non-Deep-Research, non-multi-turn ChatGPT one-shots are archived after local artifacts are saved. Use `--browser-archive never` to disable or `--browser-archive always` to force archiving after a successful browser run. Archived chats remain manageable in ChatGPT.
- Conversation mode guidance: use one-shot browser runs for narrow bug reports or quick file-set reviews; use explicit browser follow-ups for ambiguous architecture/product tradeoffs where a challenge pass and final decision are valuable; use Deep Research for broad public-web questions that need citations. Oracle never invents follow-ups automatically.
- Project Sources: `oracle project-sources list|add --chatgpt-url <project-url>` manages the Project Sources tab in ChatGPT browser mode. v1 is append-only (`list`, `add`, `--dry-run`) so agents can share explicit project context without deleting or replacing user sources.
Expand Down
2 changes: 2 additions & 0 deletions docs/browser-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,8 @@ oracle --engine browser \

If ChatGPT returns multiple images, the first image saves to the requested path and the rest save as numbered siblings. Without `--generate-image`, Oracle writes images to the session `artifacts/` directory.

MCP agents should prefer the `chatgpt_image` tool. It wraps the same behavior with a smaller input shape, uploads reference files by default, and returns saved files in `structuredContent.images`. Advanced callers can still pass `generateImage` to `consult` directly.

### Manual login mode (persistent profile, no cookie copy)

Use `--browser-manual-login` when cookie decrypt is blocked (e.g., Windows app-bound cookies) or you prefer to sign in explicitly. You can also make it the default via `browser.manualLogin` in `~/.oracle/config.json`.
Expand Down
34 changes: 33 additions & 1 deletion docs/mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,54 @@ Claude Code can call `oracle-mcp` and ask a subscription-backed ChatGPT browser

## Tools

### `chatgpt_image`

- Inputs: `prompt` (required), `files?: string[]` for reference images/assets, `outputPath?: string`, `aspectRatio?: string`, `model?: string`, plus browser controls such as `browserThinkingTime`, `browserModelLabel`, `browserModelStrategy`, `browserArchive`, `browserKeepBrowser`, and `dryRun`.
- Behavior: convenience wrapper for ChatGPT browser image generation. It forces `engine:"browser"`, sets `generateImage` for the existing image-aware wait/download path, and defaults `browserAttachments:"always"` when files are provided so reference images are uploaded instead of pasted.
- Output: returns the normal session metadata plus `requestedOutputPath` and `structuredContent.images[]` with saved image paths and ChatGPT file metadata when available. If `outputPath` is omitted, Oracle picks a unique file under `ORACLE_HOME_DIR/generated/`.
- Output path safety: agent-supplied `outputPath` must resolve under `ORACLE_HOME_DIR` by default; paths outside it (`..` traversal, and symlinked parents that escape the home — resolved via `realpath`) are rejected. Set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to allow writing elsewhere as an explicit decision. Omit `outputPath` to use the safe default.
- Local browser only: image output is unsupported when a remote browser service is configured (`ORACLE_REMOTE_HOST`); the image would be written on the remote host and not transferred back, so `chatgpt_image`/`consult` image runs fail closed with a clear error rather than returning empty `structuredContent.images`. Run on the local browser to generate images.

```json
{
"prompt": "Create a 9:16 App Store screenshot background for a focus timer.",
"files": ["./reference-screen.png"],
"aspectRatio": "9:16"
}
```

### `consult`

- Inputs: `prompt` (required), `files?: string[]` (globs), `model?: string` (defaults to CLI), `engine?: "api" | "browser"` (optional; Oracle follows CLI defaults: `ORACLE_ENGINE` and the effective config first, then API when `OPENAI_API_KEY` is set, otherwise browser), `slug?: string`.
- Presets: `preset?: "chatgpt-pro-heavy"` applies browser mode + current Pro model alias + extended thinking, unless the request overrides those fields.
- Browser-only extras: `browserAttachments?: "auto"|"never"|"always"`, `browserBundleFiles?: boolean`, `browserBundleFormat?: "text"|"zip"`, `browserThinkingTime?: "light"|"standard"|"extended"|"heavy"`, `browserResearchMode?: "deep"`, `browserFollowUps?: string[]`, `browserArchive?: "auto"|"always"|"never"`, `browserKeepBrowser?: boolean`, `browserModelLabel?: string`, `browserModelStrategy?: "select"|"current"|"ignore"`.
- Browser-only extras: `browserAttachments?: "auto"|"never"|"always"`, `browserBundleFiles?: boolean`, `browserBundleFormat?: "text"|"zip"`, `browserThinkingTime?: "light"|"standard"|"extended"|"heavy"`, `browserResearchMode?: "deep"`, `browserFollowUps?: string[]`, `browserArchive?: "auto"|"always"|"never"`, `browserKeepBrowser?: boolean`, `browserModelLabel?: string`, `browserModelStrategy?: "select"|"current"|"ignore"`, `generateImage?: string`, `outputPath?: string`.
- Dry runs: set `dryRun: true` to preview the resolved request without creating a session or touching the browser.
- Behavior: starts a session, runs it with the chosen engine, returns final output + metadata. Background/foreground follows the CLI (e.g., GPT‑5 Pro detaches by default). If API mode fails because `OPENAI_API_KEY` is missing and you have ChatGPT Pro, retry with `engine: "browser"` or `preset: "chatgpt-pro-heavy"` to use your signed-in ChatGPT session instead of an API key.
- Logging: emits MCP logs (`info` per line, `debug` for streamed chunks with byte sizes). If browser prerequisites are missing, returns an error payload instead of running.
- Research mode: set `browserResearchMode:"deep"` for broad public-web research and cited reports. Use normal browser runs with `gpt-5.5-pro` + `browserThinkingTime:"extended"` for Pro Extended code review, or `gpt-5.5` + `browserThinkingTime:"heavy"` when you explicitly want Thinking Heavy.
- Multi-turn consults: set `browserFollowUps:["Challenge your recommendation", "Give the final decision"]` to keep one ChatGPT browser conversation open and ask sequential follow-up prompts. Use one-shot calls for narrow bugs and exact file-set reviews; use multi-turn for ambiguous architecture/product decisions where a challenge pass and final recommendation are useful; use Deep Research for broad public-web work with citations. Oracle never invents follow-ups automatically.
- Archiving: set `browserArchive:"auto"|"always"|"never"` to control ChatGPT conversation cleanup. `auto` archives only successful browser one-shots after local artifacts are saved, and skips project, Deep Research, multi-turn, failed, and incomplete sessions.
- ChatGPT image generation: set `engine:"browser"` and `generateImage` to a path under `ORACLE_HOME_DIR` to use the same image-aware wait/download path as CLI `--generate-image`. Saved files are returned in `structuredContent.images` and recorded as session artifacts; multiple images save as numbered siblings. Agent-supplied `generateImage` / `outputPath` are constrained to `ORACLE_HOME_DIR` by default (set `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` to allow external paths).

#### Long browser consults from agents

Browser-backed GPT-5.5 Pro consults can legitimately run for many minutes. Some MCP clients show little progress while a tool call is active, so agents should treat a long Oracle call as a running browser job, not as a failed step. Start with `dryRun:true` when configuring a new agent, prefer `preset:"chatgpt-pro-heavy"` or `engine:"browser"` explicitly, and use the shared session store (`sessions`, `oracle status`, or `oracle session <id>`) before retrying a prompt. If the browser control plan says Oracle will launch visible Chrome, use attach/remote Chrome when the operator is actively using the computer.

#### ChatGPT images from agents

For generated images, pass an explicit `generateImage` path. That opt-in is important because it switches the browser wait loop to watch for ChatGPT image artifacts instead of only assistant text. The path must resolve under `ORACLE_HOME_DIR` unless `ORACLE_MCP_ALLOW_EXTERNAL_OUTPUT=1` is set.

```json
{
"engine": "browser",
"model": "gpt-5.5-pro",
"prompt": "Create a 9:16 App Store screenshot background for a focus timer.",
"generateImage": "${ORACLE_HOME_DIR}/generated/focus-timer-bg.png"
}
```

The MCP response includes `structuredContent.images[]` with the saved file path, MIME type, size, and ChatGPT file metadata when available.

### `sessions`

- Inputs: `{id?, hours?, limit?, includeAll?, detail?}` mirroring `oracle status` / `oracle session`.
Expand Down
8 changes: 5 additions & 3 deletions src/browser/chatgptImages.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import fs from "node:fs/promises";
import path from "node:path";
import { randomUUID } from "node:crypto";
import type {
BrowserGeneratedImage,
BrowserLogger,
Expand Down Expand Up @@ -203,9 +204,10 @@ function resolveDefaultGeneratedImagePath(
sessionId?: string,
): string {
const primary = images[0];
const stemSource =
primary?.fileId || primary?.alt || primary?.url || `generated-${Date.now().toString(36)}`;
const stem = sanitizeGeneratedImageStem(stemSource) || `generated-${Date.now().toString(36)}`;
// Random fallback token keeps concurrent session-less saves from colliding.
const uniqueFallback = `generated-${Date.now().toString(36)}-${randomUUID().slice(0, 8)}`;
const stemSource = primary?.fileId || primary?.alt || primary?.url || uniqueFallback;
const stem = sanitizeGeneratedImageStem(stemSource) || uniqueFallback;
const baseDir = sessionId
? resolveSessionArtifactsDir(sessionId)
: path.join(getOracleHomeDir(), ".temp");
Expand Down
2 changes: 2 additions & 0 deletions src/mcp/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import { pathToFileURL } from "node:url";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { getCliVersion } from "../version.js";
import { registerChatGptImageTool } from "./tools/chatgptImage.js";
import { registerConsultTool } from "./tools/consult.js";
import { registerProjectSourcesTool } from "./tools/projectSources.js";
import { registerSessionsTool } from "./tools/sessions.js";
Expand All @@ -24,6 +25,7 @@ export async function startMcpServer(): Promise<void> {
);

registerConsultTool(server);
registerChatGptImageTool(server);
registerProjectSourcesTool(server);
registerSessionsTool(server);
registerSessionResources(server);
Expand Down
142 changes: 142 additions & 0 deletions src/mcp/tools/chatgptImage.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import type { CallToolResult } from "@modelcontextprotocol/sdk/types.js";
import path from "node:path";
import { randomUUID } from "node:crypto";
import { z } from "zod";
import { getOracleHomeDir } from "../../oracleHome.js";
import type { ConsultInput } from "../types.js";
import { consultOutputShape, runConsultTool } from "./consult.js";

const chatGptImageInputShape = {
prompt: z.string().min(1, "Prompt is required.").describe("Image generation prompt."),
files: z
.array(z.string())
.default([])
.describe("Optional reference image/file paths or globs to upload to ChatGPT."),
outputPath: z
.string()
.optional()
.describe(
"Where to save the first generated image. Defaults to a unique file under ORACLE_HOME_DIR/generated/.",
),
aspectRatio: z
.string()
.optional()
.describe('Optional requested image aspect ratio, e.g. "1:1", "9:16", or "16:9".'),
model: z
.string()
.optional()
.describe("Optional ChatGPT/browser model label or alias. Defaults follow Oracle config."),
browserModelLabel: z.string().optional().describe("Explicit ChatGPT UI model label to select."),
browserAttachments: z
.enum(["auto", "never", "always"])
.optional()
.describe(
'How to deliver files. Defaults to "always" when files are present so reference images are uploaded.',
),
browserThinkingTime: z
.enum(["light", "standard", "extended", "heavy"])
.optional()
.describe("Set ChatGPT thinking time when supported by the chosen model."),
browserModelStrategy: z
.enum(["select", "current", "ignore"])
.optional()
.describe("Model picker strategy. Mirrors the consult tool and CLI browser flag."),
browserArchive: z
.enum(["auto", "always", "never"])
.optional()
.describe("Archive completed ChatGPT conversations after local artifacts are saved."),
browserKeepBrowser: z
.boolean()
.optional()
.describe("Keep Chrome running after completion for debugging."),
dryRun: z
.boolean()
.optional()
.describe("Preview the resolved image run without touching the browser."),
slug: z.string().optional().describe("Optional human-friendly session id."),
} satisfies z.ZodRawShape;

const chatGptImageOutputShape = {
// Mirror the consult output contract so structuredContent stays consistent
// (images/artifacts/resolved are typed by the shared consult shapes), plus the
// image-specific echo of the requested path.
...consultOutputShape,
requestedOutputPath: z.string(),
} satisfies z.ZodRawShape;

const chatGptImageInputSchema = z.object(chatGptImageInputShape).strict();

export type ChatGptImageInput = z.infer<typeof chatGptImageInputSchema>;

function resolveDefaultImageOutputPath(): string {
// Include a random token so concurrent agent calls in the same millisecond do
// not resolve to the same default path and overwrite each other.
const unique = `${Date.now().toString(36)}-${randomUUID().slice(0, 8)}`;
return path.join(getOracleHomeDir(), "generated", `chatgpt-image-${unique}.png`);
}

function appendAspectRatio(prompt: string, aspectRatio?: string): string {
const requestedAspectRatio = aspectRatio?.trim();
if (!requestedAspectRatio) {
return prompt.trim();
}
return `${prompt.trim()}\n\nCreate the image with aspect ratio ${requestedAspectRatio}.`;
}

export function buildChatGptImageConsultInput(input: ChatGptImageInput): ConsultInput {
const files = input.files ?? [];
const outputPath = input.outputPath?.trim() || resolveDefaultImageOutputPath();
const browserAttachments =
input.browserAttachments ?? (files.length > 0 ? ("always" as const) : undefined);
return {
prompt: appendAspectRatio(input.prompt, input.aspectRatio),
files,
model: input.model,
engine: "browser",
browserModelLabel: input.browserModelLabel,
browserAttachments,
browserThinkingTime: input.browserThinkingTime,
browserModelStrategy: input.browserModelStrategy,
browserArchive: input.browserArchive,
browserKeepBrowser: input.browserKeepBrowser,
generateImage: outputPath,
dryRun: input.dryRun,
slug: input.slug,
};
}

export function registerChatGptImageTool(server: McpServer): void {
server.registerTool(
"chatgpt_image",
{
title: "Generate an image with ChatGPT",
description:
"Agent-friendly wrapper for ChatGPT browser image generation. It selects browser mode, enables the image-aware wait/download path, uploads reference files when provided, and returns saved image paths in structuredContent.images.",
inputSchema: chatGptImageInputShape,
outputSchema: chatGptImageOutputShape,
},
async (input: unknown): Promise<CallToolResult> => {
const textContent = (text: string) => [{ type: "text" as const, text }];
let parsed;
try {
parsed = chatGptImageInputSchema.parse(input);
} catch (error) {
return {
isError: true,
content: textContent(error instanceof Error ? error.message : String(error)),
};
}
const consultInput = buildChatGptImageConsultInput(parsed);
const result = await runConsultTool(consultInput, { server: server.server });
const structuredContent = {
...(result.structuredContent ?? {}),
requestedOutputPath: consultInput.generateImage,
};
return {
...result,
structuredContent,
};
},
);
}
Loading