diff --git a/docs/SANDBOX.md b/docs/SANDBOX.md new file mode 100644 index 0000000..28ef188 --- /dev/null +++ b/docs/SANDBOX.md @@ -0,0 +1,225 @@ +# Sandbox Runtime + +GitPilot executes code in a configurable sandbox so the chat **▶ Run** button, +the agent's autonomous build/test loop, and the HTTP API all share one +runtime contract. Three backends ship in the box. + +## Backends + +| Backend | Isolation | Use it when | +| ---------------- | ---------------------------- | ------------------------------------------------------ | +| `subprocess` | host process, cwd jail | **Default.** Tries simple snippets locally. | +| `matrixlab` | Docker container per snippet | Enterprise — untrusted code, multi-tenant, audit-able. | +| `off` | none (pass-through) | Local dev only. No jail; equivalent to host shell. | + +`subprocess` is the safe default so a fresh install runs hello-world without +any setup. Operators pick `matrixlab` from **Settings → Sandbox runtime** for +isolated, ephemeral, resource-limited execution. + +## Precedence + +Resolution order at every sandbox call: + +``` +explicit > GITPILOT_SANDBOX env > ~/.gitpilot/settings.json > "subprocess" +``` + +When an env var shadows the persisted choice, `GET /api/sandbox/status` +returns `env_override: "GITPILOT_SANDBOX"` and the Settings panel renders an +**env override** badge so the user understands why their UI selection isn't +taking effect. + +## How the three surfaces share one path + +``` +┌─────────────────────┐ ┌──────────────────────┐ +│ Chat ▶ Run button │ │ Agent run_in_sandbox │ +│ Chat run_command │ │ Agent run_command │ +└──────────┬──────────┘ └──────────┬───────────┘ + │ │ + └──────────┬─────────────────┘ + ▼ + ┌──────────────────────┐ + │ POST /api/sandbox/run│ same backend, same policy, + │ {language, code} │ same error envelope + └──────────┬───────────┘ + │ + ┌─────────────┼──────────────┐ + ▼ ▼ ▼ + SubprocessSandbox NullSandbox MatrixLabSandbox ──► POST /code/run + (default) (off) on the Runner +``` + +- The **frontend ▶ Run button** in chat (`frontend/components/RunnableCodeBlock.jsx`) + POSTs the fenced snippet to `/api/sandbox/run`. +- The **agent's `run_in_sandbox` tool** is the same HTTP call wrapped as a + CrewAI tool, so a single binding governs both human and autonomous runs. +- The **agent's `run_command` tool** routes through the same endpoint: + `bash` → `language=bash, code=` against the configured backend. + +## Configuration + +### From the UI + +`Settings → Sandbox runtime` shows a radio (Local / MatrixLab / Pass-through) +plus a MatrixLab card with URL, bearer token (write-only — saved tokens +display as bullets), default image, network egress toggle, timeout, and a +**Test connection** button. + +### From the environment + +| Var | Effect | +| ----------------------------------------- | ----------------------------------------------------------------- | +| `GITPILOT_SANDBOX` | Pins backend (`subprocess` \| `matrixlab` \| `off`) | +| `GITPILOT_MATRIXLAB_URL` | MatrixLab Runner base URL (default `http://localhost:8000`) | +| `GITPILOT_MATRIXLAB_TOKEN` | Bearer token sent on every request | +| `GITPILOT_MATRIXLAB_IMAGE` | Default image override (e.g. `matrix-lab-sandbox-python:latest`) | +| `GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE` | Set to `1` to enable the Install / Start / Stop buttons | + +### From `settings.json` + +```json +{ + "sandbox": { + "backend": "matrixlab", + "matrixlab_url": "http://localhost:8000", + "matrixlab_token": "", + "matrixlab_image": "", + "allow_network": false, + "timeout_sec": 120 + } +} +``` + +Secrets never round-trip to the browser: `GET /api/settings` returns +`has_token: true|false` instead of the token itself. + +## HTTP API + +### `GET /api/sandbox/status` + +Returns the live backend, reachability of the configured MatrixLab Runner, +and `env_override` if an env var is shadowing the persisted choice. + +### `PUT /api/sandbox/config` + +Updates any subset of the persisted `SandboxSettings`. Unknown backend +values return `400` (only `subprocess`, `matrixlab`, `off` accepted). + +### `POST /api/sandbox/run` + +```jsonc +// request +{ "language": "python", "code": "print(2 + 2)", "timeout_sec": 60 } + +// response +{ + "backend": "matrixlab", + "language": "python", + "command": "python ", + "exit_code": 0, + "stdout": "4\n", + "stderr": "", + "duration_ms": 1868, + "truncated": false, + "timed_out": false, + "sandbox_id": "63baa623-…" // assigned by MatrixLab when backend=matrixlab +} +``` + +Supported languages: `python` (`py`), `javascript` (`js`/`node`), `bash` +(`sh`/`shell`). Unknown languages return `400`. Snippets run in an +ephemeral tempdir (not the workspace) so file-system side effects don't +pollute the repo. + +### MatrixLab lifecycle + +`GET /api/sandbox/matrixlab/lifecycle` reports `installed` (Docker image +present), `running` (URL reachable), `docker_available`, and +`lifecycle_enabled` (the env-flag gate). Always safe to call — pure +inspection. + +The mutating endpoints below are gated behind +`GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE=1`. Without the flag they return +`403`, never silently execute Docker on behalf of a browser POST. + +| Method | Path | Action | +| ------ | --------------------------------- | --------------------------------------- | +| `POST` | `/api/sandbox/matrixlab/install` | `docker pull` runner + sandbox images | +| `POST` | `/api/sandbox/matrixlab/start` | `docker run -d` (idempotent by name) | +| `POST` | `/api/sandbox/matrixlab/stop` | `docker stop gitpilot-matrixlab` | + +Each response carries the full `steps` transcript (`cmd`, `exit_code`, +`stdout`, `stderr`, `duration_ms` per step) so failures are debuggable +without SSH'ing to the host. + +## Error retrieval + +The point of running through a sandbox is that failures come back as +structured signals, not opaque silence. Every backend returns: + +- `exit_code` — non-zero on failure; `-1` for "could not launch" +- `stderr` — full traceback / compiler diagnostic, verbatim +- `timed_out` — `true` when the runner killed the process +- `truncated` — `true` when output was clipped at the policy cap + +This is what makes autonomous loops productive: the agent can read a +SyntaxError, plan the fix, and re-run. Same pattern Claude Code, Codex, +and Cursor use. + +Example trace through `run_in_sandbox(language="python", code="raise ValueError('boom')")`: + +``` +Sandbox: MatrixLab +Command: python +Exit code: 1 +Duration: 440 ms +--- stderr --- +Traceback (most recent call last): + File "/workspace/main.py", line 1, in + raise ValueError("boom") +ValueError: boom +sandbox_id: db3e427d-… +``` + +## Resource policy + +`SandboxPolicy` enforces: + +- **Wall-clock timeout** — caller-supplied or `timeout_sec` default (120s, + clamped to 600s) +- **Output cap** — 512 KB per stream; sets `truncated: true` when hit +- **Network** — `allow_network: false` strips proxy env vars on + `subprocess`; rejected at egress on `matrixlab` +- **Secret stripping** — `GITHUB_TOKEN`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, + `WATSONX_API_KEY`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN` are never + forwarded into the sandbox process +- **Destructive patterns** — `rm -rf /`, `mkfs`, `dd if=/dev/zero`, + `:(){ :|:& };:`, `shutdown -h|-r` blocked before launch + +## Quick start + +1. `make install && make run` — defaults to `subprocess`, hello-world works. +2. Switch to MatrixLab once you need real isolation: + ```bash + curl -X PUT http://localhost:8765/api/sandbox/config \ + -H 'content-type: application/json' \ + -d '{"backend": "matrixlab", "matrixlab_url": "http://localhost:8000"}' + ``` + …or click the radio in **Settings → Sandbox runtime**. +3. Run a snippet: + ```bash + curl -X POST http://localhost:8765/api/sandbox/run \ + -H 'content-type: application/json' \ + -d '{"language": "python", "code": "print(2 + 2)"}' + ``` + +## See also + +- `gitpilot/sandbox.py` — backend abstraction (`NullSandbox`, + `SubprocessSandbox`, `MatrixLabSandbox`) + `SandboxPolicy` +- `gitpilot/sandbox_api.py` — HTTP surface, lifecycle endpoints +- `gitpilot/local_tools.py` — agent `run_command` + `run_in_sandbox` tools +- `frontend/components/SettingsModal.jsx` — Sandbox runtime panel +- `frontend/components/RunnableCodeBlock.jsx` — chat ▶ Run button +- `tests/test_sandbox.py`, `tests/test_sandbox_api.py` — 28 unit tests diff --git a/frontend/App.jsx b/frontend/App.jsx index 693b5d2..72c9711 100644 --- a/frontend/App.jsx +++ b/frontend/App.jsx @@ -402,6 +402,47 @@ export default function App() { * first, ChatPanel would see an empty messages array, then our async * hydration would complete but ChatPanel wouldn't re-sync. */ + // Resolve the branch we should jump to when reopening a session. + // Preference order: + // 1. session.repos[i].branch for the active_repo (multi-repo) + // 2. session.branch (legacy single-repo field) + // Returns ``null`` when nothing is recorded. + const resolveSessionBranch = (session) => { + if (!session) return null; + if (Array.isArray(session.repos) && session.repos.length > 0) { + const target = + session.repos.find( + (r) => session.active_repo && r?.full_name === session.active_repo, + ) || session.repos[0]; + if (target?.branch) return target.branch; + } + return session.branch || null; + }; + + // Probe whether a branch still exists on GitHub. We deliberately + // reuse the existing tree endpoint instead of adding a new one — a + // 200 means the ref resolves, anything else (most importantly 404) + // means the branch is gone or otherwise unreachable. Failure + // degrades to "branch unknown" so a transient network blip falls + // back gracefully rather than misleading the user. + const probeBranchExists = async (repoFullName, branch) => { + if (!repoFullName || !branch) return false; + try { + const token = localStorage.getItem("github_token"); + const headers = {}; + if (token) headers["Authorization"] = `Bearer ${token}`; + const res = await fetch( + apiUrl( + `/api/repos/${repoFullName}/tree?ref=${encodeURIComponent(branch)}`, + ), + { headers }, + ); + return res.ok; + } catch { + return false; + } + }; + const handleSelectSession = useCallback(async (session) => { // 1. Fetch persisted messages first const messages = await fetchSessionMessages(session.id); @@ -418,11 +459,31 @@ export default function App() { // 3. NOW activate the session — ChatPanel's sync effect will read // the hydrated messages from chatBySession[session.id] setActiveSessionId(session.id); - if (session.branch && session.branch !== currentBranch) { - handleBranchChange(session.branch); + + // 4. Jump to the branch this session last published to, but verify + // it still exists on GitHub first. When the branch was deleted + // (rebased away, merged-and-pruned, …) fall back to the + // repository's default branch and tell the user what happened — + // silently landing on the default would mask data loss. + const target = resolveSessionBranch(session); + if (target && target !== currentBranch) { + const repoFullName = + session.repo || + (Array.isArray(session.repos) && session.repos[0]?.full_name); + const exists = await probeBranchExists(repoFullName, target); + if (exists) { + handleBranchChange(target); + } else { + const fallback = defaultBranch || "main"; + showToast( + "Branch not found", + `'${target}' was not found on GitHub. Switched to ${fallback}.`, + ); + if (fallback !== currentBranch) handleBranchChange(fallback); + } } // eslint-disable-next-line react-hooks/exhaustive-deps - }, [fetchSessionMessages, currentBranch]); + }, [fetchSessionMessages, currentBranch, defaultBranch]); const handleDeleteSession = useCallback( (deletedId) => { diff --git a/frontend/components/AssistantMessage.jsx b/frontend/components/AssistantMessage.jsx index 9ec8c00..ec75621 100644 --- a/frontend/components/AssistantMessage.jsx +++ b/frontend/components/AssistantMessage.jsx @@ -1,5 +1,6 @@ import React from "react"; import PlanView from "./PlanView.jsx"; +import RunnableCodeBlock, { splitFences } from "./RunnableCodeBlock.jsx"; export default function AssistantMessage({ answer, plan, executionLog, planStatus }) { // ``planStatus`` is optional metadata about the lifecycle of the plan @@ -82,13 +83,22 @@ export default function AssistantMessage({ answer, plan, executionLog, planStatu return (
- {/* Answer section */} + {/* Answer section. ``splitFences`` cuts the answer at fenced code + blocks so each runnable snippet gets its own RunnableCodeBlock + (with a per-block Run button); the surrounding prose still + renders as the existing pre-wrapped paragraph. */}

Answer

-

{answer}

+ {splitFences(answer).map((seg, i) => + seg.type === "code" ? ( + + ) : ( +

{seg.value}

+ ) + )}
diff --git a/frontend/components/ChatPanel.jsx b/frontend/components/ChatPanel.jsx index c66d274..4ccce56 100644 --- a/frontend/components/ChatPanel.jsx +++ b/frontend/components/ChatPanel.jsx @@ -3,6 +3,7 @@ import React, { useEffect, useRef, useState } from "react"; import AssistantMessage from "./AssistantMessage.jsx"; import ThinkingIndicator from "./ThinkingIndicator.jsx"; import ContextMeter from "./ContextMeter.jsx"; +import TasksPanel from "./TasksPanel.jsx"; import DiffStats from "./DiffStats.jsx"; import DiffViewer from "./DiffViewer.jsx"; import CreatePRButton from "./CreatePRButton.jsx"; @@ -35,6 +36,10 @@ export default function ChatPanel({ const [loadingPlan, setLoadingPlan] = useState(false); const [executing, setExecuting] = useState(false); const [status, setStatus] = useState(""); + // Batch B9 — populated when a plan whose first step was INDEX is + // rejected. Lets us render a small "Run with grep instead?" prompt + // so the user doesn't have to retype the goal. + const [retryAfterIndexReject, setRetryAfterIndexReject] = useState(null); // Claude-Code-on-Web: WebSocket streaming + diff + PR const [wsConnected, setWsConnected] = useState(false); @@ -255,16 +260,27 @@ export default function ChatPanel({ if (m.executionLog) meta.executionLog = m.executionLog; if (m.diff) meta.diff = m.diff; if (m.actions) meta.actions = m.actions; + // Informational plans (READ-only answers to "what does X do?" style + // questions) carry no Approve/Reject controls — pin the flag so the + // session reload re-renders the same shape. + if (m.informational) meta.informational = true; return Object.keys(meta).length > 0 ? meta : null; }; - const send = async () => { - if (!repo || !goal.trim()) return; + const send = async (overrides = {}) => { + if (!repo) return; + // Allow callers (e.g. the "Retry with grep" button on a rejected + // INDEX plan) to drive send() with a fixed goal and a router flag. + const overrideGoal = overrides.goal; + const force_no_rag = Boolean(overrides.force_no_rag); + const sourceText = overrideGoal != null ? overrideGoal : goal; + if (!sourceText || !sourceText.trim()) return; - const text = goal.trim(); + const text = sourceText.trim(); - // Clear input immediately (Claude Code behavior) - setGoal(""); + // Clear input immediately (Claude Code behavior) — but only when + // the user typed; programmatic retries leave the input alone. + if (overrideGoal == null) setGoal(""); // Reset textarea height const ta = document.querySelector(".chat-input"); if (ta) ta.style.height = "40px"; @@ -319,6 +335,13 @@ export default function ChatPanel({ repo_name: repo.name, goal: text, branch_name: effectiveBranch, + // Lets the backend record this plan as a Task on the + // session so the right-sidebar Tasks panel can trace it. + session_id: sid, + // Batch B9 — set on the "Retry with grep" path after the + // user rejects an INDEX-plan. Tells the router to + // suppress RAG / INDEX recommendations. + force_no_rag, }), signal: planController.signal, }); @@ -349,29 +372,56 @@ export default function ChatPanel({ throw new Error(detail || "Failed to generate plan"); } - // Guard: a plan with no executable file actions is not a plan we - // can approve. This happens when the planner/explorer agents - // refused (tool-loop hallucination or a real safety refusal) and - // CrewAI returned a schema-valid but empty payload. Without - // this guard the Approve & execute / Reject plan buttons would - // render against a payload that can't actually be executed. + // Classify the plan into one of three kinds so we can render the + // right shape — not just "valid or banner": + // + // * executable — at least one CREATE/MODIFY/DELETE → plan card + // with Approve & execute / Reject controls. + // * informational — every file is READ (or no files at all on a + // step that still has a meaningful description) + // AND the summary is a real answer, not the + // placeholder. This is what happens when the + // user asks "what do you think about this + // project?" — the planner correctly READs the + // relevant files and the summary IS the answer. + // Render the summary as a normal assistant + // message; do not show plan controls. + // * empty — no steps OR no actionable signal at all → + // honest failure banner. + // + // Before this classifier the second case was treated as the + // third, surfacing "I couldn't produce a plan" on perfectly + // valid READ-only plans. const planSteps = Array.isArray(data?.steps) ? data.steps : Array.isArray(data?.plan?.steps) ? data.plan.steps : []; - const hasExecutableFiles = planSteps.some( + const PLACEHOLDER_SUMMARY = "Here is the proposed plan for your request."; + const summary = + data.plan?.summary || data.summary || data.message || PLACEHOLDER_SUMMARY; + const hasExecutable = planSteps.some( (s) => Array.isArray(s?.files) && s.files.some((f) => ["CREATE", "MODIFY", "DELETE"].includes(f?.action)), ); - - // Extract summary from nested plan structure or top-level - const summary = - data.plan?.summary || data.summary || data.message || - "Here is the proposed plan for your request."; - - if (hasExecutableFiles) { + const isReadOnly = + planSteps.length > 0 && + !hasExecutable && + planSteps.every( + (s) => + !Array.isArray(s?.files) || + s.files.length === 0 || + s.files.every((f) => f?.action === "READ"), + ); + const hasRealSummary = Boolean(summary) && summary !== PLACEHOLDER_SUMMARY; + const planKind = hasExecutable + ? "executable" + : isReadOnly && hasRealSummary + ? "informational" + : "empty"; + + if (planKind === "executable") { setPlan(data); const assistantMsg = { from: "ai", @@ -382,18 +432,30 @@ export default function ChatPanel({ }; setMessages((prev) => [...prev, assistantMsg]); persistMessage(sid, "assistant", summary, pickAssistantMetadata(assistantMsg)); + } else if (planKind === "informational") { + // The summary is the answer. No plan card, no Approve/Reject — + // there is nothing to execute. We deliberately do NOT attach + // ``plan: data`` here so AssistantMessage renders this turn + // exactly like a chat reply. + setPlan(null); + const assistantMsg = { + from: "ai", + role: "assistant", + answer: summary, + content: summary, + informational: true, + }; + setMessages((prev) => [...prev, assistantMsg]); + persistMessage(sid, "assistant", summary, pickAssistantMetadata(assistantMsg)); } else { - // No executable steps — surface a clear failure to the user - // instead of half-rendering a plan card and dangling buttons. - // The most common cause is the explorer/planner agent loop - // (CrewAI same-input limiter blocks repeat tool calls, the - // agent panics and "refuses"). Encourage a retry rather than - // letting the user click Approve on nothing. + // empty — be honest about what we know. The earlier wording + // ("got stuck reading the same file twice") was a guess from + // an older bug; for the cases that actually still hit this + // branch the real signal is just "no actionable steps". setPlan(null); const failureText = - "I couldn't produce a plan for that request. The agent may have " + - "got stuck reading the same file twice. Try rephrasing, or " + - "switch to a stronger model in Settings → Provider."; + "The model returned an empty plan. Try rephrasing more concretely, " + + "or pick a stronger model in Settings → Provider."; const failureMsg = { from: "ai", role: "system", @@ -401,7 +463,7 @@ export default function ChatPanel({ }; setMessages((prev) => [...prev, failureMsg]); persistMessage(sid, "system", failureText); - setStatus("No executable plan produced."); + setStatus("No actionable plan produced."); return; } } catch (err) { @@ -432,6 +494,17 @@ export default function ChatPanel({ // --------------------------------------------------------------------------- const rejectPlan = () => { if (!plan || executing) return; + + // Batch B9 — if the rejected plan contained an INDEX step, the + // user is implicitly saying "I don't want to build the semantic + // index right now". Stash the original goal so we can offer a + // one-click "retry with grep" path on the next render. + const hadIndexStep = Array.isArray(plan?.steps) && + plan.steps.some((s) => + Array.isArray(s?.files) && s.files.some((f) => f?.action === "INDEX"), + ); + const rejectedGoal = plan?.goal || ""; + setPlan(null); setStatus("Plan rejected. No files were changed."); @@ -445,6 +518,12 @@ export default function ChatPanel({ if (sessionId) { persistMessage(sessionId, "system", rejectionMsg.content); } + + if (hadIndexStep && rejectedGoal) { + setRetryAfterIndexReject({ goal: rejectedGoal }); + } else { + setRetryAfterIndexReject(null); + } }; const execute = async () => { @@ -471,6 +550,10 @@ export default function ChatPanel({ repo_name: repo.name, plan, branch_name, + // Lets the backend persist the new branch on the session + // record so reopening this session lands on the published + // branch, not the one it was created on. + session_id: sessionId, }), }); @@ -778,6 +861,54 @@ export default function ChatPanel({
+ {/* Batch B9 — post-Reject "retry with grep" prompt. Renders + only when the user rejected a plan whose first step was an + INDEX action. One click re-issues the same goal with + force_no_rag so the router falls back to grep. */} + {retryAfterIndexReject && !loadingPlan && ( +
+ + Index skipped. Run the same goal with grep instead? + + + + + +
+ )} + {/* Diff stats bar (when agent has made changes) */} {diffData && (
)} - + + + +
diff --git a/frontend/components/PlanView.jsx b/frontend/components/PlanView.jsx index a67efb2..b543f82 100644 --- a/frontend/components/PlanView.jsx +++ b/frontend/components/PlanView.jsx @@ -4,7 +4,7 @@ export default function PlanView({ plan }) { if (!plan) return null; // Calculate totals for each action type - const totals = { CREATE: 0, MODIFY: 0, DELETE: 0 }; + const totals = { CREATE: 0, MODIFY: 0, DELETE: 0, INDEX: 0 }; plan.steps.forEach((step) => { step.files.forEach((file) => { totals[file.action] = (totals[file.action] || 0) + 1; @@ -75,6 +75,25 @@ export default function PlanView({ plan }) { color: theme.dangerText, borderColor: "rgba(239, 68, 68, 0.2)", }, + totalIndex: { + // GitPilot orange — the same brand colour the rest of the app + // uses for "infrastructure / one-time" actions. Visually + // distinct from CREATE / MODIFY / DELETE so users know this + // step doesn't write code. + backgroundColor: "rgba(217, 92, 61, 0.10)", + color: "#D95C3D", + borderColor: "rgba(217, 92, 61, 0.25)", + }, + indexNotice: { + marginTop: "8px", + fontSize: "12px", + color: "#D95C3D", + backgroundColor: "rgba(217, 92, 61, 0.05)", + padding: "8px 12px", + borderRadius: "6px", + border: "1px solid rgba(217, 92, 61, 0.15)", + lineHeight: "1.5", + }, stepsList: { listStyle: "none", padding: 0, @@ -161,6 +180,7 @@ export default function PlanView({ plan }) { case "CREATE": return styles.totalCreate; case "MODIFY": return styles.totalModify; case "DELETE": return styles.totalDelete; + case "INDEX": return styles.totalIndex; default: return {}; } }; @@ -190,6 +210,11 @@ export default function PlanView({ plan }) { {totals.DELETE} to delete )} + {totals.INDEX > 0 && ( + + {totals.INDEX === 1 ? "1 setup step" : `${totals.INDEX} setup steps`} + + )} {/* Steps List */} @@ -210,12 +235,30 @@ export default function PlanView({ plan }) { {file.action} - {file.path} + + {file.action === "INDEX" + ? "Build semantic index for this repo" + : file.path} + ))} )} + {/* B9: explain the INDEX step's cost so users can decide + informedly before clicking Approve. */} + {s.files && s.files.some((f) => f.action === "INDEX") && ( +
+ 📦 One-time semantic index build. + Embeds every file locally with MiniLM-L6-v2 (~80 MB + model on first run, ~30 s wall time for a typical + repo, ~12 MB on disk). No cloud calls. Makes future + "find / where / how" queries instant. Click{" "} + Reject plan to skip — you'll be + offered the grep fallback. +
+ )} + {/* Risks */} {s.risks && (
diff --git a/frontend/components/RunnableCodeBlock.jsx b/frontend/components/RunnableCodeBlock.jsx new file mode 100644 index 0000000..4e03d66 --- /dev/null +++ b/frontend/components/RunnableCodeBlock.jsx @@ -0,0 +1,283 @@ +import React, { useState } from "react"; + +// Languages the Run button supports. Anything not in this set still +// renders as a normal code block (no button) — keeps the visual contract +// honest: if there's a button, the snippet really is executable. +const RUNNABLE = new Set([ + "python", "py", + "javascript", "js", "node", + "bash", "sh", "shell", +]); + +// Friendly badge text per backend, surfaced so the user always knows +// which sandbox actually ran their code. Mirrors the labels in +// SettingsModal so the two views agree. +const BACKEND_LABELS = { + subprocess: "Local", + matrixlab: "MatrixLab", + off: "Pass-through", +}; + +// Map "py" → "python" etc. so the badge always shows the canonical +// language name rather than whatever alias the LLM tagged the fence +// with. +const LANG_DISPLAY = { + py: "python", + js: "javascript", + node: "javascript", + sh: "bash", + shell: "bash", +}; + +/** A single fenced code block with a per-block Run button. */ +export default function RunnableCodeBlock({ language, code }) { + const lang = (language || "").trim().toLowerCase(); + const canRun = RUNNABLE.has(lang); + const [busy, setBusy] = useState(false); + const [result, setResult] = useState(null); + const [error, setError] = useState(null); + const display = LANG_DISPLAY[lang] || lang || "text"; + + const onRun = async () => { + setBusy(true); + setResult(null); + setError(null); + try { + const res = await fetch("/api/sandbox/run", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ language: lang, code }), + }); + const data = await res.json(); + if (!res.ok) { + setError(data.detail || `HTTP ${res.status}`); + return; + } + setResult(data); + } catch (err) { + setError(err.message || "Run failed"); + } finally { + setBusy(false); + } + }; + + const copy = () => { + if (navigator?.clipboard) navigator.clipboard.writeText(code).catch(() => {}); + }; + + return ( +
+
+ {display} +
+ + {canRun && ( + + )} +
+
+
{code}
+ + {(result || error) && ( +
+
+ Output + {result && ( + + + exit {result.exit_code} + + + {BACKEND_LABELS[result.backend] || result.backend} + + {typeof result.duration_ms === "number" && ( + {result.duration_ms} ms + )} + {result.timed_out && timed out} + {result.truncated && truncated} + + )} +
+ {error &&
{error}
} + {result?.stdout &&
{result.stdout}
} + {result?.stderr &&
{result.stderr}
} + {result && !result.stdout && !result.stderr && ( +
(no output)
+ )} +
+ )} +
+ ); +} + +/** Split a markdown-ish string into text and fenced-code segments. + * + * Returned shape: ``[{type: 'text', value} | {type: 'code', language, code}]``. + * + * Kept deliberately small — full markdown rendering is out of scope; this + * only needs to recognise ```lang fences so the Run button can attach to + * code blocks the model emits. */ +export function splitFences(input) { + if (!input) return []; + const out = []; + const re = /```([a-zA-Z0-9_+-]*)\s*\n([\s\S]*?)```/g; + let last = 0; + let m; + while ((m = re.exec(input)) !== null) { + if (m.index > last) { + out.push({ type: "text", value: input.slice(last, m.index) }); + } + out.push({ type: "code", language: m[1] || "", code: m[2].replace(/\s+$/, "") }); + last = m.index + m[0].length; + } + if (last < input.length) { + out.push({ type: "text", value: input.slice(last) }); + } + return out; +} + +const styles = { + wrap: { + margin: "8px 0", + background: "#09090B", + border: "1px solid #27272A", + borderRadius: 8, + overflow: "hidden", + fontFamily: '-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif', + }, + head: { + display: "flex", + alignItems: "center", + justifyContent: "space-between", + padding: "6px 12px", + background: "#18181B", + borderBottom: "1px solid #27272A", + fontSize: 11, + }, + headRight: { display: "flex", gap: 6, alignItems: "center" }, + lang: { + color: "#A1A1AA", + fontWeight: 600, + textTransform: "uppercase", + letterSpacing: "0.05em", + fontSize: 10, + }, + iconBtn: { + background: "transparent", + color: "#A1A1AA", + border: "1px solid #3F3F46", + borderRadius: 4, + padding: "2px 8px", + fontSize: 11, + cursor: "pointer", + }, + runBtn: { + background: "#10B981", + color: "#052e1c", + border: "0", + borderRadius: 4, + padding: "2px 10px", + fontSize: 11, + fontWeight: 600, + cursor: "pointer", + }, + code: { + margin: 0, + padding: "12px 14px", + fontFamily: "ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, monospace", + fontSize: 12.5, + lineHeight: 1.55, + color: "#E4E4E7", + whiteSpace: "pre-wrap", + wordBreak: "break-word", + overflowX: "auto", + }, + output: { + background: "#0c0c10", + borderTop: "1px solid #27272A", + padding: "8px 14px 10px", + }, + outputHead: { + display: "flex", + alignItems: "center", + justifyContent: "space-between", + marginBottom: 6, + }, + outputLabel: { + fontSize: 10, + fontWeight: 600, + color: "#A1A1AA", + textTransform: "uppercase", + letterSpacing: "0.05em", + }, + metaRow: { display: "flex", gap: 6, alignItems: "center" }, + okPill: { + fontSize: 10, + fontWeight: 600, + padding: "1px 6px", + borderRadius: 9, + background: "rgba(16, 185, 129, 0.12)", + color: "#10B981", + border: "1px solid rgba(16, 185, 129, 0.35)", + }, + failPill: { + fontSize: 10, + fontWeight: 600, + padding: "1px 6px", + borderRadius: 9, + background: "rgba(239, 68, 68, 0.12)", + color: "#ef4444", + border: "1px solid rgba(239, 68, 68, 0.35)", + }, + warnPill: { + fontSize: 10, + fontWeight: 600, + padding: "1px 6px", + borderRadius: 9, + background: "rgba(217, 119, 6, 0.12)", + color: "#f59e0b", + border: "1px solid rgba(217, 119, 6, 0.35)", + }, + backendPill: { + fontSize: 10, + fontWeight: 600, + padding: "1px 6px", + borderRadius: 9, + background: "rgba(79, 70, 229, 0.12)", + color: "#a5b4fc", + border: "1px solid rgba(79, 70, 229, 0.35)", + }, + dim: { color: "#71717A", fontSize: 11 }, + stdout: { + margin: "4px 0 0", + padding: "6px 8px", + fontFamily: "ui-monospace, SFMono-Regular, Menlo, monospace", + fontSize: 12, + color: "#D4D4D8", + background: "#000", + borderRadius: 4, + whiteSpace: "pre-wrap", + wordBreak: "break-word", + }, + stderr: { + margin: "4px 0 0", + padding: "6px 8px", + fontFamily: "ui-monospace, SFMono-Regular, Menlo, monospace", + fontSize: 12, + color: "#fca5a5", + background: "#0a0000", + borderRadius: 4, + whiteSpace: "pre-wrap", + wordBreak: "break-word", + }, +}; diff --git a/frontend/components/SettingsModal.jsx b/frontend/components/SettingsModal.jsx index 24d43b1..61a44dd 100644 --- a/frontend/components/SettingsModal.jsx +++ b/frontend/components/SettingsModal.jsx @@ -1,5 +1,23 @@ import React, { useEffect, useState } from "react"; +const SANDBOX_BACKENDS = [ + { + id: "subprocess", + label: "Local", + sub: "Host subprocess with a workspace jail. Default — best for trying simple snippets.", + }, + { + id: "matrixlab", + label: "MatrixLab", + sub: "Containerised, ephemeral sandboxes from a MatrixLab Runner. Recommended for enterprise.", + }, + { + id: "off", + label: "Pass-through", + sub: "Run on the host with no jail. Local development only.", + }, +]; + export default function SettingsModal({ onClose }) { const [settings, setSettings] = useState(null); const [models, setModels] = useState([]); @@ -7,15 +25,128 @@ export default function SettingsModal({ onClose }) { const [loadingModels, setLoadingModels] = useState(false); const [testResult, setTestResult] = useState(null); // { ok: bool, message: string } const [testing, setTesting] = useState(false); + // Sandbox runtime state. ``sandbox`` is the persisted block from the + // settings response; ``sandboxStatus`` is the live probe result + // (ok / error). Both are independent of LLM settings so a failed + // MatrixLab probe doesn't block provider switching. + const [sandbox, setSandbox] = useState(null); + const [sandboxStatus, setSandboxStatus] = useState(null); + const [sandboxTokenInput, setSandboxTokenInput] = useState(""); + const [sandboxBusy, setSandboxBusy] = useState(false); + // MatrixLab lifecycle state — separate from the sandbox runtime state + // because the lifecycle endpoints can run for many seconds (docker + // pulls) and we don't want to block the "switch backend" buttons on + // a running install. + const [lifecycle, setLifecycle] = useState(null); + const [lifecycleBusy, setLifecycleBusy] = useState(null); // "install" | "start" | "stop" | null + const [lifecycleLog, setLifecycleLog] = useState([]); + const [showLifecycleLog, setShowLifecycleLog] = useState(false); const loadSettings = async () => { const res = await fetch("/api/settings"); const data = await res.json(); setSettings(data); + if (data?.sandbox) setSandbox(data.sandbox); + }; + + const loadSandboxStatus = async () => { + try { + const res = await fetch("/api/sandbox/status"); + const data = await res.json(); + setSandboxStatus({ ok: data.ok, error: data.error, remote: data.remote }); + // /status returns the same shape as the persisted block, so refresh + // the form state from it — the env vars may override settings.json + // and we want the UI to show what's actually live. + setSandbox((prev) => ({ + ...(prev || {}), + backend: data.backend, + matrixlab_url: data.matrixlab_url, + matrixlab_image: data.matrixlab_image, + allow_network: data.allow_network, + timeout_sec: data.timeout_sec, + has_token: data.has_token, + })); + } catch (err) { + setSandboxStatus({ ok: false, error: err.message || "status probe failed" }); + } + }; + + const updateSandbox = async (patch) => { + setSandboxBusy(true); + try { + const res = await fetch("/api/sandbox/config", { + method: "PUT", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify(patch), + }); + const data = await res.json(); + if (!res.ok) { + setSandboxStatus({ ok: false, error: data.detail || "update failed" }); + return; + } + setSandbox((prev) => ({ + ...(prev || {}), + backend: data.backend, + matrixlab_url: data.matrixlab_url, + matrixlab_image: data.matrixlab_image, + allow_network: data.allow_network, + timeout_sec: data.timeout_sec, + has_token: data.has_token, + })); + setSandboxStatus({ ok: data.ok, error: data.error, remote: data.remote }); + // Always clear the local token input after a save so a stale value + // doesn't sit in the DOM. The backend stores it; we don't need to + // hold it client-side. + if ("matrixlab_token" in patch) setSandboxTokenInput(""); + } finally { + setSandboxBusy(false); + } + }; + + const loadLifecycle = async () => { + try { + const res = await fetch("/api/sandbox/matrixlab/lifecycle"); + const data = await res.json(); + setLifecycle(data); + if (Array.isArray(data.steps) && data.steps.length) { + setLifecycleLog(data.steps); + } + } catch (err) { + setLifecycle({ + docker_available: false, + installed: false, + running: false, + lifecycle_enabled: false, + error: err.message || "lifecycle probe failed", + }); + } + }; + + const runLifecycle = async (action) => { + if (!["install", "start", "stop"].includes(action)) return; + setLifecycleBusy(action); + setShowLifecycleLog(true); + try { + const res = await fetch(`/api/sandbox/matrixlab/${action}`, { method: "POST" }); + const data = await res.json(); + if (!res.ok) { + setLifecycle((prev) => ({ ...(prev || {}), error: data.detail || `HTTP ${res.status}` })); + return; + } + setLifecycle(data); + setLifecycleLog(data.steps || []); + // Refresh the runtime status — a successful start should flip + // sandboxStatus.ok to true. + loadSandboxStatus(); + } finally { + setLifecycleBusy(null); + } }; useEffect(() => { loadSettings(); + loadSandboxStatus(); + loadLifecycle(); }, []); const changeProvider = async (provider) => { @@ -327,6 +458,376 @@ export default function SettingsModal({ onClose }) { phi-3-mini, gemma-2b, tinyllama, etc.
+ + {/* Sandbox Runtime section — controls the Run button on chat + code blocks. Local subprocess is the default so users can + try simple snippets immediately; MatrixLab is the enterprise + opt-in for containerised isolation. */} + {sandbox && ( +
+
+
Sandbox runtime
+ {sandboxStatus && ( + + + {sandboxStatus.ok ? "Reachable" : "Unreachable"} + + )} +
+
+ Where the Run button on generated code blocks executes. Choose Local + for a quick try, or install MatrixLab and switch to it for isolated + enterprise sandboxes. +
+ +
+ {SANDBOX_BACKENDS.map((b) => ( + + ))} +
+ + {sandbox.backend === "matrixlab" && ( +
+
+ + setSandbox({ ...sandbox, matrixlab_url: e.target.value })} + onBlur={() => updateSandbox({ matrixlab_url: sandbox.matrixlab_url })} + placeholder="http://localhost:8000" + style={{ + fontSize: 12, padding: "4px 6px", + background: "#14152a", color: "#e6e8ff", + border: "1px solid #2c2d46", borderRadius: 4, + }} + /> + +
+ setSandboxTokenInput(e.target.value)} + placeholder={sandbox.has_token ? "•••••••• (saved)" : "Optional"} + style={{ + flex: 1, + fontSize: 12, padding: "4px 6px", + background: "#14152a", color: "#e6e8ff", + border: "1px solid #2c2d46", borderRadius: 4, + }} + /> + + {sandbox.has_token && ( + + )} +
+ + setSandbox({ ...sandbox, matrixlab_image: e.target.value })} + onBlur={() => updateSandbox({ matrixlab_image: sandbox.matrixlab_image })} + placeholder="matrixlab-python (let runner pick)" + style={{ + fontSize: 12, padding: "4px 6px", + background: "#14152a", color: "#e6e8ff", + border: "1px solid #2c2d46", borderRadius: 4, + }} + /> +
+
+ )} + + {/* MatrixLab lifecycle card — only shown when MatrixLab is + the selected backend. The button label tracks the + detected state: Install → Start → Running. When the + operator hasn't enabled GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE + the actions are disabled and an inline hint explains how + to flip the env flag. */} + {sandbox.backend === "matrixlab" && lifecycle && ( +
+
+
+ MatrixLab lifecycle +
+ + + {lifecycle.running ? "Running" + : lifecycle.installed ? "Installed · stopped" + : "Not installed"} + + {sandbox.env_override && ( + + env override + + )} +
+ +
+ {/* Running > Installed > Not-installed. Checking + ``running`` first matters when the operator + brought MatrixLab up from source (e.g. `make run` + inside a checkout) so the image tag doesn't match + our canonical ``ruslanmv/matrixlab-runner:latest`` + — the URL still answers, just don't offer to + install on top of a healthy runner. */} + {lifecycle.running ? ( + + ) : lifecycle.installed ? ( + + ) : ( + + )} + + {lifecycleLog.length > 0 && ( + + )} +
+ + {lifecycle.instructions && ( +
+ {lifecycle.instructions} +
+ )} + {lifecycle.error && ( +
+ {lifecycle.error} +
+ )} + + {/* Per-step transcript — surfaced so failures are + debuggable from the UI without SSH'ing to the host. */} + {showLifecycleLog && lifecycleLog.length > 0 && ( +
+ {lifecycleLog.map((step, i) => ( +
+
$ {step.cmd}
+
+ exit {step.exit_code} · {step.duration_ms} ms +
+ {step.stdout &&
{step.stdout}
} + {step.stderr &&
{step.stderr}
} +
+ ))} +
+ )} +
+ )} + +
+ + + +
+ + {sandboxStatus?.error && ( +
+ {sandboxStatus.error} +
+ )} + {sandbox.backend === "matrixlab" && sandboxStatus?.ok && sandboxStatus?.remote?.version && ( +
+ MatrixLab Runner v{sandboxStatus.remote.version} + {typeof sandboxStatus.remote.uptime_s === "number" && + ` · up ${Math.round(sandboxStatus.remote.uptime_s / 60)} min`} +
+ )} +
+ )} ); diff --git a/frontend/components/TasksPanel.jsx b/frontend/components/TasksPanel.jsx new file mode 100644 index 0000000..114d033 --- /dev/null +++ b/frontend/components/TasksPanel.jsx @@ -0,0 +1,382 @@ +// frontend/components/TasksPanel.jsx +// +// Right-sidebar Tasks panel — Claude Code-style trace of every AI +// invocation in the active session. Trigger is a small ⊞ icon next +// to the context meter; clicking it opens a popover anchored to the +// composer rail. +// +// V1 contract (simplest cut): +// - One task per top-level user action (Plan, Execute). +// - Lazy fetch on open + manual ↻ refresh. Zero idle traffic. +// - No cost row. Token counts shown only when the provider exposes +// them; otherwise "—". +// +// GitPilot brand orange #D95C3D is used only for the running-state +// dot — completed is slate, failed is the existing red. No new deps; +// inline styles + scoped + + + + {open && ( +
+

Tasks

+ + {!sessionId && ( +
Start a chat to see tasks here.
+ )} + + {sessionId && loading && tasks.length === 0 && ( +
Loading…
+ )} + + {sessionId && error && error !== "disabled" && ( +
+ Couldn't load: {error} +
+ )} + + {sessionId && !loading && !error && tasks.length === 0 && ( +
No tasks yet.
+ )} + + {running.length > 0 && ( +
+
In flight
+ {running.map((t) => )} +
+ )} + + {completed.length > 0 && ( +
+
+ Completed ({completed.length}) +
+ {completed + .slice() + .reverse() + .map((t) => )} +
+ )} + + {sessionId && ( +
+ One row per AI invocation. + +
+ )} +
+ )} + + ); +} diff --git a/gitpilot/agent_prompts.py b/gitpilot/agent_prompts.py new file mode 100644 index 0000000..a284da8 --- /dev/null +++ b/gitpilot/agent_prompts.py @@ -0,0 +1,414 @@ +"""Lean agent-prompt templates for GitPilot (Batch B12). + +Rewrites every agent persona and task description with the small- +model rules: + +* No emotional intensifiers (CRITICAL, THOROUGHLY). +* No "etc." — explicit list or omit. +* No speculative example filenames (package.json on a repo that + doesn't have one is hallucination bait). +* Facts block lives at the bottom of every prompt — small models + over-weight the last segment. +* Per-intent rule blocks instead of one universal block — we only + inject the create / modify / delete / info rules that match what + the user actually asked for, picked off the B9 query router's + ``RouterDecision.intent``. + +The templates are plain ``str.format``-friendly so callers don't +have to know about the placeholders. Single source of truth so +tests can pin character budgets and forbidden-keyword bans without +chasing duplicated strings across ``agentic.py``. + +Gated by the ``lean_prompts`` feature flag (default **on**). When +off, callers fall back to the legacy verbose prompts in agentic.py. +""" +from __future__ import annotations + +from . import flags + +FLAG_LEAN_PROMPTS = "lean_prompts" + +# Character budgets per prompt — pinned by tests so a future +# "let me add one more rule" edit can't silently bloat them. +# Budget covers framing + a typical 10-15 file list. Larger repos +# pay more in the file-list section — that's the useful facts the +# planner needs and we count it under the planner-stack test instead. +PLAN_TASK_CHAR_BUDGET = 1_400 +EXPLORER_TASK_CHAR_BUDGET = 500 +CREATE_FILE_TASK_CHAR_BUDGET = 700 +MODIFY_FILE_TASK_CHAR_BUDGET = 600 +CODE_WRITER_BACKSTORY_BUDGET = 300 +EXPLORER_BACKSTORY_BUDGET = 200 +PLANNER_BACKSTORY_BUDGET = 250 +SPECIALIST_BACKSTORY_BUDGET = 220 + +# Words to scrub from every prompt. These are the high-volume, +# small-model-confusing tokens identified in the inventory. +FORBIDDEN_KEYWORDS = ( + "CRITICAL", + "THOROUGHLY", + "MUST", # emotional imperative — replace with the verb + "etc.", + "package.json", # speculative example file that primed hallucination +) + + +# ---------------------------------------------------------------------- +# Backstories +# ---------------------------------------------------------------------- + +EXPLORER_BACKSTORY = ( + "You inspect repositories using the supplied tools. " + "You report only what the tools return. You do not " + "speculate about files or structure." +) + +EXPLORER_GOAL = ( + "Inspect the repository and produce a fact-only summary" +) + + +PLANNER_BACKSTORY = ( + "You write structured refactor plans from verified repository " + "facts. You only reference files that appear in the supplied " + "file list. DELETE actions require that the file exists in the " + "list. CREATE actions require that the path does not." +) + +PLANNER_GOAL = ( + "Design a JSON plan for the user goal using only verified files" +) + + +CODE_WRITER_BACKSTORY = ( + "You write clean, working code or documentation that matches " + "the requested file path's extension. When reading existing " + "files is needed, you use the supplied tools first." +) + +CODE_WRITER_GOAL = ( + "Generate file content that satisfies the plan step" +) + + +# ---------------------------------------------------------------------- +# Explorer task +# ---------------------------------------------------------------------- + +EXPLORER_TASK_TEMPLATE = """\ +Repository: {repo_full_name} +Active ref: {active_ref} + +Required tool calls (in order): +1. Get repository summary +2. List all files in repository +3. Get directory structure +4. Read README.md only if it appears in the list + +Rules: +- Mention only files returned by the tools. +- Do not invent files or folders. + +Return exactly: + +REPOSITORY EXPLORATION REPORT +Files Found: +- + +Key Files: +- + +Directory Structure: + + +File Types: +=, ... +""" + + +# ---------------------------------------------------------------------- +# Plan task — intent-routed rule blocks +# ---------------------------------------------------------------------- + +# Header + footer wrap each per-intent block. The footer is the +# "facts block" the user flagged: lives at the bottom so small +# models give it the most attention weight. +PLAN_TASK_HEADER_TEMPLATE = """\ +User goal: {goal} +Repository: {repo_full_name} +Active ref: {active_ref} + +Existing files (verified by tools): +{file_list_lines} + +""" + +PLAN_TASK_RULES_CREATE = """\ +Rules: +- The user asked to create new content. Include at least one CREATE file. +- READ existing files only when needed as input for the new file. +- Do not include MODIFY or DELETE unless the goal asks for them. +""" + +PLAN_TASK_RULES_MODIFY = """\ +Rules: +- Use MODIFY only for files in the existing-files list above. +- READ a file when you need its content before modifying. +- Do not CREATE or DELETE unless the goal asks for it. +""" + +PLAN_TASK_RULES_DELETE = """\ +Rules: +- Use DELETE only for files in the existing-files list above. +- Do not include CREATE or MODIFY actions. +- Files the user wants to keep are absent from the plan. +""" + +PLAN_TASK_RULES_FIND = """\ +Rules: +- The user asked a search question. +- Plan READ actions for files likely to contain the answer. +- Include a substantive summary that answers the question. +""" + +PLAN_TASK_RULES_INFO = """\ +Rules: +- The user asked an informational question. +- Empty steps is fine; the summary itself is the answer. +- Use READ only when you need a specific file's content. +""" + +PLAN_TASK_RULES_UNKNOWN = """\ +Rules: +- READ / MODIFY / DELETE only for files in the existing-files list. +- CREATE only for paths NOT in that list. +- Match the action to what the user goal asks for. +""" + +# Schema block kept tight — one example object, no prose explanations. +PLAN_TASK_SCHEMA = """\ +Return one JSON object only (no markdown fences, no prose): +{ + "goal": "...", + "summary": "...", + "steps": [ + { + "step_number": 1, + "title": "...", + "description": "...", + "files": [ + {"path": "", "action": "READ"}, + {"path": "", "action": "CREATE"} + ], + "risks": null + } + ] +} + +JSON rules: +- "action" is one of: CREATE, MODIFY, DELETE, READ, INDEX +- "step_number" is a positive integer +- "risks" is either a string or null (the JSON null literal) +- The entire response is the JSON object — nothing before or after +""" + +# Footer = the facts block. Last 200-300 chars of the prompt = +# highest attention weight on small models. +PLAN_TASK_FOOTER_TEMPLATE = """\ +Known facts: +- Total files in repository: {file_count} +- A path NOT in the existing-files list above does NOT exist. +- Never mention a file that is not in that list as if it exists. + +Now produce the JSON plan. +""" + + +_INTENT_TO_RULES = { + "create": PLAN_TASK_RULES_CREATE, + "modify": PLAN_TASK_RULES_MODIFY, + "fix": PLAN_TASK_RULES_MODIFY, # fix = modify under the hood + "delete": PLAN_TASK_RULES_DELETE, + "find": PLAN_TASK_RULES_FIND, + "info": PLAN_TASK_RULES_INFO, + "unknown": PLAN_TASK_RULES_UNKNOWN, +} + + +def render_plan_task( + *, + goal: str, + repo_full_name: str, + active_ref: str, + file_list: list[str], + intent: str | None, +) -> str: + """Build the planner's task description from verified facts. + + ``intent`` is the literal from :class:`gitpilot.query_router.RouterDecision`. + Unknown / missing intent falls back to the generic rule block. + """ + file_lines = "\n".join(f"- {p}" for p in file_list) if file_list else "(empty repository)" + rules = _INTENT_TO_RULES.get((intent or "unknown").lower(), PLAN_TASK_RULES_UNKNOWN) + return ( + PLAN_TASK_HEADER_TEMPLATE.format( + goal=goal, repo_full_name=repo_full_name, + active_ref=active_ref, file_list_lines=file_lines, + ) + + rules + + "\n" + + PLAN_TASK_SCHEMA + + "\n" + + PLAN_TASK_FOOTER_TEMPLATE.format(file_count=len(file_list)) + ) + + +def render_explorer_task(*, repo_full_name: str, active_ref: str) -> str: + return EXPLORER_TASK_TEMPLATE.format( + repo_full_name=repo_full_name, active_ref=active_ref, + ) + + +# ---------------------------------------------------------------------- +# Code-writer tasks — CREATE and MODIFY +# ---------------------------------------------------------------------- + +CREATE_FILE_TASK_TEMPLATE = """\ +Generate the full contents of a new file: {file_path} + +Goal: {goal} +Step context: {step_description} + +Rules: +- Match the file extension's conventions ({extension}). +- If existing files are relevant, use the supplied tools to read them. +- Output ONLY the file content (no explanations, no markdown fences). +""" + +MODIFY_FILE_TASK_TEMPLATE = """\ +Modify the file: {file_path} + +Goal: {goal} +Step context: {step_description} + +Current file content: +{current_content} + +Rules: +- Preserve every line that does not need to change. +- Match the file extension's conventions ({extension}). +- Output ONLY the complete updated file (no explanations, no fences). +""" + + +def render_create_file_task( + *, + file_path: str, + goal: str, + step_description: str, +) -> str: + return CREATE_FILE_TASK_TEMPLATE.format( + file_path=file_path, + goal=goal, + step_description=step_description, + extension=_ext_of(file_path), + ) + + +def render_modify_file_task( + *, + file_path: str, + goal: str, + step_description: str, + current_content: str, +) -> str: + return MODIFY_FILE_TASK_TEMPLATE.format( + file_path=file_path, + goal=goal, + step_description=step_description, + extension=_ext_of(file_path), + current_content=current_content, + ) + + +def _ext_of(path: str) -> str: + name = path.rsplit("/", 1)[-1] + if "." not in name: + return name or "(no extension)" + return "." + name.rsplit(".", 1)[-1].lower() + + +# ---------------------------------------------------------------------- +# Specialist agent backstories (Issue / PR / Search / Code Review / …) +# ---------------------------------------------------------------------- +# +# These were ~500-char persona blocks under the previous design. Each +# is now ~150-200 chars: role + scope + single tool-use sentence. + +SPECIALIST_BACKSTORIES = { + "issue_management": ( + "You manage GitHub issues — list, create, comment, label, close, assign. " + "You use the supplied issue tools and report concrete results." + ), + "pr_management": ( + "You manage GitHub pull requests — list, create, review, comment, merge. " + "You use the supplied PR tools and report concrete results." + ), + "search_discovery": ( + "You answer search and discovery questions about the repository. " + "You use file-listing and content-search tools and cite exact matches." + ), + "code_review": ( + "You review code for correctness, clarity, and obvious bugs. " + "You quote the specific lines you reference." + ), + "learning_guidance": ( + "You answer GitHub how-to and convention questions in plain text. " + "You do not modify the repository." + ), + "local_editor": ( + "You edit local files using the supplied filesystem tools. " + "You preserve existing content unless instructed to change it." + ), + "terminal_executor": ( + "You run terminal commands using the supplied shell tool. " + "You explain command output briefly." + ), +} + + +# ---------------------------------------------------------------------- +# Flag check helper +# ---------------------------------------------------------------------- + +def lean_prompts_enabled() -> bool: + """Single source of truth for callers in agentic.py — flag-on means + use the templates here; flag-off falls back to legacy verbose + strings still defined inline in agentic.py.""" + return flags.is_on(FLAG_LEAN_PROMPTS, default=True) + + +__all__ = [ + "FLAG_LEAN_PROMPTS", + "FORBIDDEN_KEYWORDS", + "PLAN_TASK_CHAR_BUDGET", + "EXPLORER_TASK_CHAR_BUDGET", + "CREATE_FILE_TASK_CHAR_BUDGET", + "MODIFY_FILE_TASK_CHAR_BUDGET", + "CODE_WRITER_BACKSTORY_BUDGET", + "EXPLORER_BACKSTORY_BUDGET", + "PLANNER_BACKSTORY_BUDGET", + "SPECIALIST_BACKSTORY_BUDGET", + "EXPLORER_BACKSTORY", + "EXPLORER_GOAL", + "PLANNER_BACKSTORY", + "PLANNER_GOAL", + "CODE_WRITER_BACKSTORY", + "CODE_WRITER_GOAL", + "SPECIALIST_BACKSTORIES", + "lean_prompts_enabled", + "render_create_file_task", + "render_explorer_task", + "render_modify_file_task", + "render_plan_task", +] diff --git a/gitpilot/agent_tools.py b/gitpilot/agent_tools.py index e0a34ea..160e4be 100644 --- a/gitpilot/agent_tools.py +++ b/gitpilot/agent_tools.py @@ -8,7 +8,7 @@ from crewai.tools import tool -from .github_api import get_repo_tree, get_file +from .github_api import get_file, get_repo_tree def _sanitize_tool_arg(value: Any, fallback_key: str = "description") -> str: @@ -198,32 +198,232 @@ def get_directory_structure() -> str: return f"Error: {str(e)}" -@tool("Read file content") -def read_file(file_path: Any) -> str: - """Read the content of a file from the active repository. +# ---------------------------------------------------------------------- +# Windowed-Read defaults — match Claude Code's contract +# ---------------------------------------------------------------------- +READ_DEFAULT_LIMIT = 2000 # default line cap when limit is omitted +READ_MAX_LIMIT = 10_000 # hard ceiling — beyond this the caller + # must paginate via offset +GLOB_DEFAULT_MAX_RESULTS = 200 # cap for "Find files matching a pattern" +GLOB_HARD_MAX_RESULTS = 1_000 - file_path: the file's path relative to the repository root, e.g. - "README.md" or "src/main.py". Pass a plain string — do **not** pass - a dict like ``{"description": "...", "type": "str"}`` (that is the - parameter's schema, not its value). + +def _coerce_int(value: Any, default: int) -> int: + """CrewAI sometimes passes ints as strings or dicts. Coerce + safely; anything we can't parse falls back to the default. """ - file_path = _sanitize_tool_arg(file_path) + if value is None: + return default + if isinstance(value, bool): + return default + if isinstance(value, (int, float)): + return int(value) + if isinstance(value, str): + try: + return int(value.strip()) + except (TypeError, ValueError): + return default + if isinstance(value, dict): + # Common CrewAI schema-leak: {"description": "...", "type": "int"} + return default + return default + + +@tool("Find files matching a pattern") +def list_repository_files_glob( + pattern: Any, + max_results: Any = GLOB_DEFAULT_MAX_RESULTS, +) -> str: + """Search the repository for files whose path matches a glob. + + pattern: a pathlib-style glob. Examples: + "**/*.py" all Python files + "src/**/*.tsx" every .tsx under src + "**/test_*.py" all pytest files + "README*" top-level README files + max_results: hard cap on the number of paths returned (default 200, + max 1000). When the cap is hit the result is annotated so the + caller can refine. + + Output: one path per line. Path-only — no contents. Use + "Read file content" afterwards if you need bytes. + """ + pattern = _sanitize_tool_arg(pattern, fallback_key="pattern") or "**/*" + cap = max(1, min(GLOB_HARD_MAX_RESULTS, _coerce_int(max_results, GLOB_DEFAULT_MAX_RESULTS))) try: owner, repo, token, branch = get_repo_context() loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) try: - # Pass token + ref explicitly - content = loop.run_until_complete(get_file(owner, repo, file_path, token=token, ref=branch)) + tree = loop.run_until_complete(get_repo_tree(owner, repo, token=token, ref=branch)) finally: loop.close() + if not tree: + return f"Repository is empty - no files. (Branch: {branch})" + + # ``fnmatch`` understands `*`/`?`/`[…]` but treats `**` as a + # plain star. Translate `**` → match-any-segments by walking + # the pattern manually for a tighter match on the common case. + paths = [item["path"] for item in tree] + matches = _glob_match(paths, pattern) + truncated = False + if len(matches) > cap: + matches = matches[:cap] + truncated = True + + if not matches: + return f"No files matched pattern: {pattern}\n(Branch: {branch}, total files: {len(paths)})" + + header = f"Repository: {owner}/{repo} (Branch: {branch})\nMatching: {pattern}\n" + body = "\n".join(f" - {p}" for p in sorted(matches)) + footer = f"\n…{cap}+ matches truncated. Refine the pattern.\n" if truncated else "" + return f"{header}{body}{footer}" + except Exception as e: + return f"Error globbing files: {str(e)}" + + +import re as _re + + +def _glob_to_regex(pattern: str) -> "_re.Pattern[str]": + """Translate a shell-style glob into a regex with proper `/`-aware + semantics — the same contract Claude Code, ripgrep and bash use: + + * ``*`` matches anything **except** ``/`` + * ``**`` matches anything **including** ``/`` (any number of segments) + * ``?`` matches exactly one non-``/`` character + * ``[abc]`` character class (passed through to regex) + * everything else is literal + + The result is anchored with ``\\A`` and ``\\Z`` so it must match the + full path — ``*.py`` will not falsely match ``src/foo.py``. + """ + out: list[str] = [] + i = 0 + while i < len(pattern): + c = pattern[i] + if c == "*": + if i + 1 < len(pattern) and pattern[i + 1] == "*": + # `**` — match any number of full segments. When the + # following character is `/` consume it as part of the + # match (so `**/foo.py` correctly matches `foo.py` + # at the repo root). + if i + 2 < len(pattern) and pattern[i + 2] == "/": + out.append("(?:.*/)?") + i += 3 + continue + out.append(".*") + i += 2 + continue + out.append("[^/]*") + i += 1 + elif c == "?": + out.append("[^/]") + i += 1 + elif c == ".": + out.append(r"\.") + i += 1 + elif c == "[": + # Character class — pass through up to the matching ']'. + j = pattern.find("]", i + 1) + if j == -1: + out.append(r"\[") + i += 1 + else: + out.append(pattern[i : j + 1]) + i = j + 1 + else: + out.append(_re.escape(c)) + i += 1 + return _re.compile(r"\A" + "".join(out) + r"\Z") + + +def _glob_match(paths: List[str], pattern: str) -> List[str]: + """Match paths against a glob with `/`-aware semantics.""" + rx = _glob_to_regex(pattern) + return [p for p in paths if rx.match(p)] + + +def _fetch_file_content(file_path: str) -> str | None: + """Fetch a file from the active repository using the current context.""" + owner, repo, token, branch = get_repo_context() + + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + try: + return loop.run_until_complete( + get_file(owner, repo, file_path, token=token, ref=branch) + ) + finally: + loop.close() + + +@tool("Read file content") +def read_file(file_path: Any) -> str: + """Read the content of a file from the active repository. + + file_path: the file's path relative to the repository root, e.g. + "README.md" or "src/main.py". Pass a plain string — do **not** pass + a dict like {"description": "...", "type": "str"}. + """ + file_path = _sanitize_tool_arg(file_path) + try: + content = _fetch_file_content(file_path) return f"Content of {file_path}:\n---\n{content}\n---" except Exception as e: return f"Error reading file {file_path}: {str(e)}" +@tool("Read file content window") +def read_file_window( + file_path: Any, + offset: Any = 0, + limit: Any = READ_DEFAULT_LIMIT, +) -> str: + """Read a line window from a file in the active repository. + + This advanced pagination tool is intentionally not included in the + default repository tool list. Keep the primary "Read file content" + tool's schema simple for smaller ReAct models. + + file_path: the file's path relative to the repository root. + offset: 0-indexed line number to start reading from. + limit: maximum number of lines to return (default 2000, max 10000). + """ + file_path = _sanitize_tool_arg(file_path) + start = max(0, _coerce_int(offset, 0)) + span = max(1, min(READ_MAX_LIMIT, _coerce_int(limit, READ_DEFAULT_LIMIT))) + try: + content = _fetch_file_content(file_path) + if content is None: + return f"Error reading file {file_path}: empty response" + + lines = content.splitlines() + total = len(lines) + if total == 0: + return f"Content of {file_path}:\n---\n(empty file)\n---" + + end = min(total, start + span) + slice_text = "\n".join(lines[start:end]) + + header = f"Content of {file_path}" + if start > 0 or end < total: + header += f" (lines {start + 1}-{end} of {total})" + + footer = "" + if end < total: + remaining = total - end + footer = ( + f"\n…{remaining} more lines. Continue with offset={end} " + f"to read further." + ) + return f"{header}:\n---\n{slice_text}\n---{footer}" + except Exception as e: + return f"Error reading file {file_path}: {str(e)}" + + @tool("Get repository summary") def get_repository_summary() -> str: """Provides a comprehensive summary of the repository.""" @@ -247,6 +447,137 @@ def get_repository_summary() -> str: # Write tools — allow agents to create, update, and delete files via GitHub API # --------------------------------------------------------------------------- +@tool("Edit a section of a file (exact string replacement)") +def edit_file( + file_path: Any, + old_string: Any, + new_string: Any, + commit_message: Any, + expected_occurrences: Any = 1, +) -> str: + """Surgical edit — replace a small section of a file without + re-emitting the rest. Use this whenever you want to fix a bug, + rename a symbol, or insert a few lines into a file that already + exists. Never use ``Write or update a file`` to apply a small + change — that requires re-emitting the whole file and corrupts + long files on small-context models. + + file_path: path relative to the repo root. Plain string. + old_string: the exact text to find — including surrounding + indentation and (where needed) preceding/trailing context + so the match is unique. Plain string. + new_string: the replacement text. Plain string. Pass an empty + string to delete the matched block. + commit_message: short imperative commit summary. + expected_occurrences: how many times old_string is expected to + appear in the file. Default 1. Pass a higher number to + rename an identifier that appears N times; pass -1 to allow + any positive number. When the actual count differs, the + edit is refused — widen old_string to disambiguate. + + On success returns "File '' edited (N occurrence(s) replaced). + Commit: ". On failure returns an actionable error message + starting with "Error:". + """ + from .edit_backend import EditError, apply_edit + from .github_api import get_file, put_file + + file_path = _sanitize_tool_arg(file_path) + old_string_s = old_string if isinstance(old_string, str) else _sanitize_tool_arg(old_string, fallback_key="value") + new_string_s = new_string if isinstance(new_string, str) else _sanitize_tool_arg(new_string, fallback_key="value") + commit_message_s = _sanitize_tool_arg(commit_message, fallback_key="value") or f"Edit {file_path}" + expected = _coerce_int(expected_occurrences, 1) + + try: + owner, repo, token, branch = get_repo_context() + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + try: + current = loop.run_until_complete( + get_file(owner, repo, file_path, token=token, ref=branch) + ) + new_content, report = apply_edit( + current or "", + old_string=old_string_s, + new_string=new_string_s, + expected_occurrences=expected, + ) + result = loop.run_until_complete( + put_file(owner, repo, file_path, new_content, commit_message_s, token=token, branch=branch) + ) + finally: + loop.close() + + sha = result.get("commit_sha", "") + return ( + f"File '{file_path}' edited " + f"({report.occurrences_replaced} occurrence(s) replaced, " + f"{report.bytes_before} → {report.bytes_after} bytes). " + f"Commit: {sha[:8]}" + ) + except EditError as e: + # User-facing — keep the original message so the agent can + # widen the context and retry. + return f"Error: {e}" + except Exception as e: + return f"Error editing file {file_path}: {e}" + + +@tool("Apply a unified diff to a file") +def apply_patch_to_file( + file_path: Any, + diff: Any, + commit_message: Any, +) -> str: + """Apply a unified-diff patch to a single file. Use this when the + change involves several non-contiguous edits inside one file and + a single ``Edit a section of a file`` call wouldn't capture all + of them cleanly. + + file_path: path relative to the repo root. + diff: a single-file unified diff with one or more @@-hunks. The + helper matches each hunk by *context lines* (the leading-space + lines around the change), so line numbers can be stale. + Multi-file diffs are not accepted — split them first. + commit_message: short imperative commit summary. + + Returns the same shape as ``Edit a section of a file``. + """ + from .edit_backend import EditError, apply_unified_diff + from .github_api import get_file, put_file + + file_path = _sanitize_tool_arg(file_path) + diff_s = diff if isinstance(diff, str) else _sanitize_tool_arg(diff, fallback_key="value") + commit_message_s = _sanitize_tool_arg(commit_message, fallback_key="value") or f"Patch {file_path}" + + try: + owner, repo, token, branch = get_repo_context() + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + try: + current = loop.run_until_complete( + get_file(owner, repo, file_path, token=token, ref=branch) + ) + new_content, report = apply_unified_diff(current or "", diff_s) + result = loop.run_until_complete( + put_file(owner, repo, file_path, new_content, commit_message_s, token=token, branch=branch) + ) + finally: + loop.close() + + sha = result.get("commit_sha", "") + return ( + f"File '{file_path}' patched " + f"({report.occurrences_replaced} hunk(s) applied, " + f"{report.bytes_before} → {report.bytes_after} bytes). " + f"Commit: {sha[:8]}" + ) + except EditError as e: + return f"Error: {e}" + except Exception as e: + return f"Error patching file {file_path}: {e}" + + @tool("Write or update a file in the repository") def write_file(file_path: Any, content: Any, commit_message: Any) -> str: """Create or update a file in the repository. @@ -332,5 +663,162 @@ def create_repo_branch(branch_name: str) -> str: # Export tools -REPOSITORY_TOOLS = [list_repository_files, get_directory_structure, read_file, get_repository_summary] -WRITE_TOOLS = [write_file, delete_repo_file, create_repo_branch] +@tool("Search file contents") +def grep_repository( + pattern: Any, + path_pattern: Any = None, + case_insensitive: Any = False, + max_results: Any = 100, +) -> str: + """Search the repository for a regex pattern across file contents. + + pattern: a Python-style regular expression. Use this when you need + to find a symbol, string, import, or any other content that + listing/globbing won't reveal. + path_pattern: optional glob to scope the search (e.g. "**/*.py", + "src/**/*.ts"). Same `/`-aware semantics as + "Find files matching a pattern". + case_insensitive: pass true to match regardless of case. + max_results: hard cap (default 100, max 500). Beyond the cap the + result is annotated so you can narrow the search. + + Output: one match per line, formatted ``path:line: matched_text``. + """ + from .grep_backend import ( + GREP_DEFAULT_MAX_RESULTS, + format_result, + grep, + ) + + pattern_str = _sanitize_tool_arg(pattern, fallback_key="pattern") or "" + if not pattern_str: + return "Error: empty search pattern" + path_filter_str = path_pattern if isinstance(path_pattern, str) else None + ci_flag = bool(case_insensitive) if not isinstance(case_insensitive, dict) else False + cap = _coerce_int(max_results, GREP_DEFAULT_MAX_RESULTS) + + try: + owner, repo, token, branch = get_repo_context() + + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + try: + tree = loop.run_until_complete(get_repo_tree(owner, repo, token=token, ref=branch)) + finally: + loop.close() + + if not tree: + return f"Repository is empty - no files to search. (Branch: {branch})" + + # Pre-filter file list by path glob BEFORE fetching contents — + # this is the single biggest cost saving on GitHub-backed repos. + paths = [item["path"] for item in tree] + if path_filter_str: + paths = _glob_match(paths, path_filter_str) + if not paths: + return ( + f"No files matched path_pattern: {path_filter_str}\n" + f"(Branch: {branch}, total files: {len(tree)})" + ) + + # Cap the number of files we fetch — at 200 paths × ~50 KB each + # that's already 10 MB. Anything beyond is the caller's job + # to narrow with a tighter path_pattern. + FILE_FETCH_CAP = 200 + paths = paths[:FILE_FETCH_CAP] + + # Fetch contents concurrently. ``get_file`` is async so we batch. + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + try: + async def _gather(): + import asyncio as _aio + async def _fetch(p): + try: + return p, await get_file(owner, repo, p, token=token, ref=branch) + except Exception: + return p, None + return await _aio.gather(*(_fetch(p) for p in paths)) + results = loop.run_until_complete(_gather()) + finally: + loop.close() + + files = {p: c for p, c in results if isinstance(c, str)} + if not files: + return f"Could not fetch any matching files. (Tried {len(paths)} paths.)" + + rx_path_filter = _glob_to_regex(path_filter_str) if path_filter_str else None + result = grep( + files, + pattern_str, + case_insensitive=ci_flag, + max_results=cap, + path_filter=rx_path_filter, + ) + return format_result(result, pattern=pattern_str) + except Exception as e: + return f"Error in grep_repository: {str(e)}" + + +@tool("Find code by semantic search") +def semantic_search(query: Any, k: Any = 8) -> str: + """Find the most semantically-similar code chunks for a natural- + language query. Powered by a local on-prem RAG index (ChromaDB + + MiniLM-L6-v2 by default; pure-Python hashing fallback when the + model isn't available). + + query: what you want to find, in natural language. Example + queries: "authentication middleware", "where do we parse the + plan response", "the function that talks to OpenAI". + k: how many results to return (default 8, max 20). + + Output: one chunk per result, formatted as ``path:start-end`` + plus a short excerpt. Returns "No matches" silently when the + index hasn't been built yet — fall back to grep / glob in that + case. + + Gated behind the ``rag_retrieval`` flag — when off this tool + isn't registered with the agent at all. + """ + from . import flags + from .rag import FLAG_RAG_RETRIEVAL, retrieve_top_k + + if not flags.is_on(FLAG_RAG_RETRIEVAL, default=False): + return "Semantic search is disabled. Enable the rag_retrieval flag and build the index first." + + q = _sanitize_tool_arg(query, fallback_key="query") or "" + if not q: + return "Error: empty search query" + kk = max(1, min(20, _coerce_int(k, 8))) + try: + owner, repo, token, branch = get_repo_context() + hits = retrieve_top_k(q, owner=owner, repo=repo, branch=branch or "HEAD", k=kk) + if not hits: + return ( + f"No semantic matches for: {q}\n" + "Either the index hasn't been built yet, or no chunks " + "matched. Try the 'Search file contents' tool instead." + ) + lines = [f"Top {len(hits)} semantic match(es) for: {q}"] + for h in hits: + excerpt = h.text.replace("\n", " ").strip()[:200] + lines.append(f" {h.path}:{h.start_line}-{h.end_line} (score={h.score:.2f})") + lines.append(f" {excerpt}") + return "\n".join(lines) + except Exception as e: + return f"Error in semantic_search: {str(e)}" + + +REPOSITORY_TOOLS = [ + list_repository_files, + get_directory_structure, + read_file, + get_repository_summary, +] +WRITE_TOOLS = [ + edit_file, # B8: surgical exact-string replacement + apply_patch_to_file, # B8: unified-diff patch + write_file, + delete_repo_file, + create_repo_branch, +] diff --git a/gitpilot/agentic.py b/gitpilot/agentic.py index 5c30ff9..22f2883 100644 --- a/gitpilot/agentic.py +++ b/gitpilot/agentic.py @@ -154,6 +154,82 @@ def _crewai(): _tools_cache: dict = {} +async def _execute_index_action( + owner: str, repo: str, *, token: str | None, branch_name: str | None, +) -> str: + """Handle the ``INDEX`` plan-step pseudo-action (Batch B9). + + Triggers a one-time RAG index build for the active repo: + fetches every file via the GitHub tree, runs them through the + chunker / embedder, persists the ChromaDB collection, and grants + per-repo consent so future fuzzy queries auto-build incrementally. + + Returns a one-line summary suitable for the execution-log step + output. Failures are surfaced as their own line; we never raise + because that would abort sibling steps in the same plan. + """ + from .github_api import get_file, get_repo_tree + from .rag.indexer import build_index_from_files + from .rag_consent import grant_consent + + try: + tree = await get_repo_tree(owner, repo, token=token, ref=branch_name) + except Exception as exc: + logger.warning("[index] could not list repo tree: %s", exc) + return f"! Failed to list repo for indexing: {exc}" + + paths = [item["path"] for item in (tree or []) if item.get("path")] + if not paths: + return "i Repo is empty — nothing to index." + + # Cap how many files we'll embed in one user-approved build to + # bound time + disk. Anything over the cap still produces a + # usable index covering the most-important files; the rest can + # be added incrementally on subsequent builds. + INDEX_FETCH_CAP = 500 + paths = paths[:INDEX_FETCH_CAP] + + async def _fetch(p: str) -> tuple[str, str | None]: + try: + return p, await get_file(owner, repo, p, token=token, ref=branch_name) + except Exception: + return p, None + + import asyncio as _aio + results = await _aio.gather(*(_fetch(p) for p in paths)) + files: list[tuple[str, str]] = [ + (p, c) for p, c in results if isinstance(c, str) and c + ] + if not files: + return "! Could not fetch any repo files for indexing." + + # Build synchronously inside the await — embedding is CPU-bound + # and we want the user to see "indexing complete" before the + # next plan step runs. + try: + report = build_index_from_files( + files, + owner=owner, + repo=repo, + branch=branch_name or "HEAD", + ) + except Exception as exc: + logger.warning("[index] build failed: %s", exc) + return f"! Index build failed: {exc}" + + try: + grant_consent(owner, repo) + except Exception as exc: # pragma: no cover - defensive + logger.debug("[index] could not grant consent: %s", exc) + + return ( + f"+ Indexed {report.files_indexed} file(s) " + f"({report.chunks_added} chunks, embedder={report.embedder_name}, " + f"skipped={report.files_skipped}). " + f"Semantic search is now available for {owner}/{repo}." + ) + + def _tools(): """Return cached tool collections (lazy-loaded on first use).""" if not _tools_cache: @@ -185,9 +261,18 @@ def _build_llm(): class PlanFile(BaseModel): - """Represents a file operation in a plan step.""" + """Represents a file operation in a plan step. + + ``INDEX`` (Batch B9) is a special pseudo-action: the ``path`` is + treated as a marker ("__repo__") rather than a real file, and the + executor branch triggers a one-time RAG index build for the active + repo. Surfaced as its own plan step so the user approves the + indexing cost (time + disk) just like any other action. + """ path: str - action: Literal["CREATE", "MODIFY", "DELETE", "READ"] = "MODIFY" + action: Literal[ + "CREATE", "MODIFY", "DELETE", "READ", "INDEX", + ] = "MODIFY" class PlanStep(BaseModel): @@ -262,9 +347,23 @@ async def generate_plan( repo_full_name: str, token: str | None = None, branch_name: str | None = None, + *, + routing_hint: str | None = None, + intent: str | None = None, ) -> PlanResult: """Agentic planning: create a structured plan but DO NOT modify the repo. + ``intent`` is the literal from :class:`gitpilot.query_router.RouterDecision` + (fix / find / info / create / delete / modify). When supplied AND + the ``lean_prompts`` flag is on, the planner's task description + uses only the rule block matching the intent — small models stop + drowning in irrelevant create-vs-delete-vs-modify rules. + + ``routing_hint`` is an optional pre-classified directive from + :mod:`gitpilot.query_router` that gets concatenated into the + planner's context_pack. Advisory — the planner can override + when context demands more exploration. + Two-phase approach: 1) Explore and understand the repository (on the correct branch) 2) Create a plan based on actual repository state @@ -285,6 +384,12 @@ async def generate_plan( if context_pack: logger.info("[GitPilot] Context pack loaded (%d chars)", len(context_pack)) + # Batch B9 — append the API-layer router's strategy hint so the + # planner sees the recommended intent / target files / tool order. + if routing_hint: + context_pack = (context_pack or "") + ("\n\n" if context_pack else "") + routing_hint + logger.info("[GitPilot] Router hint injected (%d chars)", len(routing_hint)) + # PHASE 1: Explore repository (correct branch) logger.info("[GitPilot] Phase 1: Exploring repository %s (ref=%s)...", repo_full_name, active_ref) @@ -295,10 +400,49 @@ async def generate_plan( active_ref, ) + # Batch B6: pin a compact "repo map" into the planner's context. + # Same idea Aider, Cursor and Claude Code use — give the planner a + # high-level site map (key files + modules + language histogram) + # in <= 500 tokens, persisted to disk so we don't rebuild it on + # every turn. Best-effort: a failure here must never block the + # planner. + try: + from . import flags as _flags + from .repo_map import FLAG_REPO_MAP, build_repo_map + + if _flags.is_on(FLAG_REPO_MAP, default=True): + _all_files = list(repo_context_data.get("all_files") or []) + if _all_files: + _map = build_repo_map( + owner=owner, repo=repo, branch=active_ref or "HEAD", + paths=_all_files, + ) + if _map.agents_md: + context_pack = (context_pack or "") + ( + "\n\n" if context_pack else "" + ) + _map.agents_md + logger.info( + "[GitPilot] Repo map pinned (%d tokens, %d modules, %d key files)", + len(_map.agents_md.split()), # rough proxy + len(_map.modules), + len(_map.key_files), + ) + except Exception as _map_err: # pragma: no cover - defensive + logger.debug("[GitPilot] repo map injection skipped: %s", _map_err) + + # Batch B12 — when ``lean_prompts`` is on, every persona / task + # description is sourced from ``gitpilot.agent_prompts`` so prompt + # budgets are pinned by tests and never accidentally bloated. + from . import agent_prompts as _ap + + _lean = _ap.lean_prompts_enabled() + explorer = _crewai()["Agent"]( role="Repository Explorer", - goal="Thoroughly explore and document the current state of the repository", - backstory=( + goal=_ap.EXPLORER_GOAL if _lean else ( + "Thoroughly explore and document the current state of the repository" + ), + backstory=_ap.EXPLORER_BACKSTORY if _lean else ( "You are a meticulous code archaeologist who explores repositories " "to understand their complete structure before any changes are made. " "You use all available tools to build a comprehensive picture." @@ -309,8 +453,13 @@ async def generate_plan( allow_delegation=False, ) - explore_task = _crewai()["Task"]( - description=dedent(f""" + if _lean: + _explore_description = _ap.render_explorer_task( + repo_full_name=repo_full_name, active_ref=active_ref, + ) + _explore_expected = "A repository exploration report in the documented format" + else: + _explore_description = dedent(f""" Repository: {repo_full_name} Active Ref (branch/tag/SHA): {active_ref} @@ -338,8 +487,14 @@ async def generate_plan( File Types: [count files by extension] Your report MUST be based on ACTUAL tool calls, not assumptions. - """), - expected_output="A detailed exploration report listing ALL files found in the repository", + """) + _explore_expected = ( + "A detailed exploration report listing ALL files found in the repository" + ) + + explore_task = _crewai()["Task"]( + description=_explore_description, + expected_output=_explore_expected, agent=explorer, ) @@ -375,34 +530,64 @@ def _explore(): "request, or switch to a stronger LLM via Settings → Provider." ) from exc - exploration_report = exploration_result.raw if hasattr(exploration_result, "raw") else str(exploration_result) - logger.info("[GitPilot] Exploration complete. Report length: %s chars", len(exploration_report)) + exploration_report_raw = exploration_result.raw if hasattr(exploration_result, "raw") else str(exploration_result) + logger.info("[GitPilot] Exploration complete. Report length: %s chars", len(exploration_report_raw)) + + # Batch B5: protect the planner's context by compressing the + # explorer's free-form report into a fixed-budget summary. When + # the report already fits (small repos, small models) this is a + # no-op; on big repos it can shave 3–6 KB off the planner prompt + # without losing any concrete file paths. + try: + from .explorer_summary import compress_exploration_report + + exploration_report, _exp_metrics = compress_exploration_report(exploration_report_raw) + if _exp_metrics.compressed_tokens < _exp_metrics.original_tokens: + logger.info( + "[GitPilot] Compressed exploration report: %d → %d tokens " + "(%d/%d files kept)", + _exp_metrics.original_tokens, + _exp_metrics.compressed_tokens, + _exp_metrics.files_kept, + _exp_metrics.files_in_original, + ) + except Exception as _exp_err: # pragma: no cover - defensive + logger.debug("[GitPilot] explorer compression failed: %s", _exp_err) + exploration_report = exploration_report_raw # PHASE 2: Plan creation based on exploration logger.info("[GitPilot] Phase 2: Creating plan based on repository exploration (ref=%s)...", active_ref) # Build planner backstory with optional context pack injection - _planner_backstory = ( - "You are an experienced staff engineer who creates plans based on FACTS, not assumptions. " - "You have received a complete exploration report of the repository. " - "You ONLY create plans for files that actually exist in the exploration report. " - "You are extremely careful with DELETE actions - you verify the file exists " - "and that it's not on the 'keep' list before marking it for deletion. " - "When users ask to delete files, you delete individual FILES, not directory names. " - "When users ask to ANALYZE files and GENERATE new content (code, docs, examples), " - "you create plans that READ existing files and CREATE new files with generated content. " - "You understand that 'analyze X and create Y' means: use tools to read X, then plan to CREATE Y. " - "You never make changes yourself, only create detailed plans." - ) - if context_pack: + if _lean: + _planner_backstory = _ap.PLANNER_BACKSTORY + _planner_goal = _ap.PLANNER_GOAL + else: + _planner_backstory = ( + "You are an experienced staff engineer who creates plans based on FACTS, not assumptions. " + "You have received a complete exploration report of the repository. " + "You ONLY create plans for files that actually exist in the exploration report. " + "You are extremely careful with DELETE actions - you verify the file exists " + "and that it's not on the 'keep' list before marking it for deletion. " + "When users ask to delete files, you delete individual FILES, not directory names. " + "When users ask to ANALYZE files and GENERATE new content (code, docs, examples), " + "you create plans that READ existing files and CREATE new files with generated content. " + "You understand that 'analyze X and create Y' means: use tools to read X, then plan to CREATE Y. " + "You never make changes yourself, only create detailed plans." + ) + _planner_goal = ( + "Design safe, step-by-step refactor plans based on ACTUAL repository state " + "discovered during exploration" + ) + # context_pack additions (B6 repo map + B9 routing hint) are only + # appended in non-lean mode; on small models they bloat the prompt + # and push the JSON-schema rules out of the attention window. + if context_pack and not _lean: _planner_backstory += "\n\n" + context_pack planner = _crewai()["Agent"]( role="Repository Refactor Planner", - goal=( - "Design safe, step-by-step refactor plans based on ACTUAL repository state " - "discovered during exploration" - ), + goal=_planner_goal, backstory=_planner_backstory, llm=llm, tools=_tools()["REPOSITORY_TOOLS"], @@ -410,8 +595,20 @@ def _explore(): allow_delegation=False, ) - plan_task = _crewai()["Task"]( - description=dedent(f""" + if _lean: + # Use the per-intent compact template from agent_prompts. + # Pass the verified file list directly so the planner sees the + # facts block at the bottom of the prompt — highest attention + # weight on small models. + _plan_description = _ap.render_plan_task( + goal="{goal}", # CrewAI inputs substitution happens later + repo_full_name=repo_full_name, + active_ref=active_ref or "HEAD", + file_list=list(repo_context_data.get("all_files") or []), + intent=intent, + ) + else: + _plan_description = dedent(f""" User goal: {{goal}} Repository: {repo_full_name} Active Ref (branch/tag/SHA): {active_ref} @@ -485,7 +682,9 @@ def _explore(): - Do NOT wrap the JSON in markdown code fences - Do NOT add any explanation before or after the JSON - The ENTIRE response MUST be ONLY the JSON object, starting with '{{' and ending with '}}' - """), + """) + plan_task = _crewai()["Task"]( + description=_plan_description, expected_output=dedent(""" A single valid JSON object matching the PlanResult schema: - goal: string @@ -736,9 +935,19 @@ async def generate_plan_lite( repo_full_name: str, token: str | None = None, branch_name: str | None = None, + *, + routing_hint: str | None = None, + intent: str | None = None, ) -> PlanResult: """Lite Mode planning: smart intent detection + single agent + pre-fetched context. + ``routing_hint`` is accepted for signature parity with + :func:`generate_plan`. Lite Mode has its own simpler routing + via regex intent classification, so the hint is currently + treated as advisory metadata only — it does not change the + Lite planner's behaviour. Kept here so call sites can use a + single signature for both planners. + The topology is: 1. Classify intent (regex — instant, no LLM) 2. Pre-fetch repo context from GitHub API (no LLM tool-calling) @@ -1078,6 +1287,14 @@ def _modify(): elif file.action == "READ": step_summary += f"\n i Inspected {file.path}" + elif file.action == "INDEX": + # Batch B9 — INDEX is a special plan step that + # triggers the local RAG index build for this repo. + summary_line = await _execute_index_action( + owner, repo, token=token, branch_name=branch_name, + ) + step_summary += f"\n {summary_line}" + except Exception as e: logger.exception("Lite: Error processing %s: %s", file.path, e) step_summary += f"\n ! Error: {file.path}: {e}" @@ -1129,10 +1346,16 @@ async def execute_plan( # CRITICAL: ensure tools read from the ACTIVE execution branch _tools()["set_repo_context"](owner, repo, token=token, branch=branch_name) + # Batch B12 — lean persona from agent_prompts when the flag is on. + from . import agent_prompts as _ap + _lean_writer = _ap.lean_prompts_enabled() + code_writer = _crewai()["Agent"]( role="Expert Code Writer", - goal="Generate high-quality, production-ready code and documentation based on requirements.", - backstory=( + goal=_ap.CODE_WRITER_GOAL if _lean_writer else ( + "Generate high-quality, production-ready code and documentation based on requirements." + ), + backstory=_ap.CODE_WRITER_BACKSTORY if _lean_writer else ( "You are a senior software engineer with expertise in multiple programming languages. " "You write clean, well-documented, and functional code. " "You understand context and generate appropriate content for each file type. " @@ -1155,8 +1378,14 @@ async def execute_plan( for file in step.files: try: if file.action == "CREATE": - create_task = _crewai()["Task"]( - description=( + if _lean_writer: + _create_description = _ap.render_create_file_task( + file_path=file.path, + goal=plan.goal, + step_description=step.description, + ) + else: + _create_description = ( f"Generate complete content for a new file: {file.path}\n\n" f"Overall Goal: {plan.goal}\n" f"Step Context: {step.description}\n\n" @@ -1177,8 +1406,10 @@ async def execute_plan( "- Do NOT include placeholder comments like 'TODO' or 'IMPLEMENT THIS'\n" "- The content should be fully functional and informative\n\n" "Return ONLY the file content, no explanations or markdown code blocks." - ), - expected_output=f"Complete, production-ready content for {file.path}", + ) + create_task = _crewai()["Task"]( + description=_create_description, + expected_output=f"Complete content for {file.path}", agent=code_writer, ) @@ -1302,6 +1533,13 @@ def _modify(): elif file.action == "READ": step_summary += f"\n ℹ️ READ-only: inspected {file.path}" + elif file.action == "INDEX": + # Batch B9 — triggers the per-repo RAG index build. + summary_line = await _execute_index_action( + owner, repo, token=token, branch_name=branch_name, + ) + step_summary += f"\n {summary_line}" + except Exception as e: # noqa: BLE001 logger.exception( "Error processing file %s in step %s: %s", @@ -1440,11 +1678,18 @@ def _build_terminal_agent(llm) -> Agent: role="Terminal & Shell Executor", goal="Execute shell commands safely in the workspace and report results", backstory=( - "You are a terminal expert that runs shell commands in a sandboxed " - "environment. You can run tests, linters, build tools, and other " - "development commands. You always report exit codes and output. " - "You refuse to run destructive commands like rm -rf / or format disks. " - "You explain command output clearly to the user." + "You are a terminal expert that runs shell commands in the " + "sandbox the user picked in Settings (local subprocess by " + "default, MatrixLab for containerised enterprise isolation). " + "Both run_command and run_in_sandbox route through the same " + "backend, so the user's runtime choice applies to your " + "autonomous loop too — not just to the Run button in chat. " + "Use run_command for workspace commands (tests, linters, " + "builds) and run_in_sandbox(language, code) when you want " + "to validate a self-contained snippet before returning it. " + "Always report the exit code and surface stderr verbatim " + "when a run fails: the trace is your debugging signal. " + "You refuse destructive commands like 'rm -rf /' or 'mkfs'. " ), llm=llm, tools=_tools()["LOCAL_SHELL_TOOLS"] + _tools()["LOCAL_GIT_TOOLS"], diff --git a/gitpilot/api.py b/gitpilot/api.py index 0107bce..75fcc85 100644 --- a/gitpilot/api.py +++ b/gitpilot/api.py @@ -311,6 +311,17 @@ def _env_bool(name: str, default: bool) -> bool: except Exception: # noqa: BLE001 logger.exception("MCP admin API failed to mount; tab will show as unavailable") +# Sandbox runtime API (Settings → Sandbox runtime, Run button on chat +# code blocks). Mounting is non-fatal so a partial deployment can still +# serve chat / planner endpoints if this module fails to import. +try: + from .sandbox_api import router as sandbox_router + + app.include_router(sandbox_router) + logger.info("Sandbox API enabled (mounting /api/sandbox/* endpoints)") +except Exception: # noqa: BLE001 + logger.exception("Sandbox API failed to mount; Run button will be disabled") + # GitPilot-as-MCP-server (turns GitPilot into an MCP server other agents # can drive). Off by default; mount only when GITPILOT_EXPOSE_MCP_SERVER=true. try: @@ -591,7 +602,32 @@ def _build_local_repo_aware_prompt(req, session) -> str: "- Output the COMPLETE file content, not just a snippet.\n" "- For edits to existing files, output the full updated file.\n" "- Be explicit about which files to create or modify and why.\n" - "- Prefer incremental, production-safe changes over large rewrites." + "- Prefer incremental, production-safe changes over large rewrites.\n" + "\n" + "RUNNABLE EXAMPLES (separate from file-output fences):\n" + "When the user asks for a small example they could try out — " + "\"write a hello-world\", \"give me a snippet that ...\", " + "\"show me how to call X\" — emit the example as a fenced block " + "with ONLY the language on the opening line (no filepath):\n" + "\n" + " ```python\n" + " print('Hello, world!')\n" + " ```\n" + "\n" + " ```javascript\n" + " console.log('Hello, world!');\n" + " ```\n" + "\n" + " ```bash\n" + " echo 'Hello, world!'\n" + " ```\n" + "\n" + "The chat UI shows a per-block ▶ Run button next to these " + "snippets and executes them in the user's selected sandbox " + "(local subprocess or MatrixLab). Supported languages: python, " + "javascript (or js/node), bash (or sh/shell). Keep snippets " + "self-contained — they run in a fresh tempdir with no project " + "files mounted — and short enough to read at a glance." ) sections = [system_block] @@ -701,6 +737,10 @@ class SettingsResponse(BaseModel): ollabridge: dict langflow_url: str has_langflow_plan_flow: bool + # Sandbox runtime selection — populated by settings_response_from. The + # field is Optional so older serialised payloads continue to validate + # even though the runtime always writes a value today. + sandbox: Optional[dict] = None class ProviderModelsResponse(BaseModel): @@ -718,6 +758,16 @@ class ChatPlanRequest(BaseModel): repo_name: str goal: str branch_name: Optional[str] = None + # Optional: when present, the planner invocation is recorded as a + # Task on the active session so the right-sidebar Tasks panel can + # trace it. Older frontends that omit this field continue to work + # — no task is recorded, no error raised. + session_id: Optional[str] = None + # Batch B9: set by the post-Reject "retry with grep" path so the + # router suppresses RAG / INDEX recommendations on the next + # attempt of the same goal. Default False — older frontends are + # unaffected. + force_no_rag: bool = False class ExecutePlanRequest(BaseModel): @@ -725,6 +775,12 @@ class ExecutePlanRequest(BaseModel): repo_name: str plan: PlanResult branch_name: Optional[str] = None + # Optional: when present, the active session's `branch` (and the + # matching `repos[i].branch`) is updated to the branch the executor + # actually wrote to, so reopening the session jumps to that branch + # instead of the one it was created on. Older frontends that omit + # this field continue to work — no session update is attempted. + session_id: Optional[str] = None class AuthUrlResponse(BaseModel): @@ -1017,6 +1073,13 @@ async def api_put_file( # ============================================================================ def settings_response_from(s: AppSettings) -> SettingsResponse: + sandbox_dump = s.sandbox.model_dump() + # Strip the secret value before it leaves the process — the frontend + # only needs to know whether a token is configured, not the token + # itself. Keeps GET /api/settings safe to log and to surface in the + # browser devtools. + token = sandbox_dump.pop("matrixlab_token", "") + sandbox_payload = {**sandbox_dump, "has_token": bool(token)} return SettingsResponse( provider=s.provider, providers=[ @@ -1033,6 +1096,7 @@ def settings_response_from(s: AppSettings) -> SettingsResponse: ollabridge=s.ollabridge.model_dump(), langflow_url=s.langflow_url, has_langflow_plan_flow=bool(s.langflow_plan_flow_id), + sandbox=sandbox_payload, ) @@ -1213,8 +1277,110 @@ async def api_context_usage(session_id: Optional[str] = Query(None)): # Chat Endpoints # ============================================================================ + +def _track_task(*, kind: str, title_fn=None): + """Decorator: wrap a chat endpoint so its run is recorded as a Task + on the active session (right-sidebar trace). + + Reads ``session_id`` directly off the request model. ``title_fn`` + is a small callable that derives the human title from the request + object — keeps the decorator decoupled from any specific schema. + Endpoints whose requests don't carry a session_id behave exactly + as before — no Task is recorded, no error is raised. + """ + import functools + + from .task_recorder import begin_task as _begin_task + from .task_recorder import finish_task as _finish_task + + def _default_title(_req): + return kind.title() + + extract_title = title_fn or _default_title + + def deco(handler): + @functools.wraps(handler) + async def wrapper(req, *args, **kwargs): + session_id = getattr(req, "session_id", None) + try: + raw_title = extract_title(req) + except Exception: + raw_title = None + title = (raw_title or kind.title())[:160] + task = _begin_task(_session_mgr, session_id, kind=kind, title=title) + status = "failed" + err: Optional[str] = None + try: + result = await handler(req, *args, **kwargs) + status = "completed" + return result + except HTTPException as exc: + # HTTPException paths are still "failed" from the + # tasks-panel point of view (the user did not get a + # plan / commit). Preserve the detail as the error. + err = str(exc.detail) if exc.detail else None + raise + except Exception as exc: + err = str(exc) + raise + finally: + _finish_task( + _session_mgr, + session_id, + task, + status=status, + error=err, + ) + return wrapper + return deco + + +def _maybe_compact_session_for_request(session_id: Optional[str]) -> None: + """Best-effort auto-compaction hook (Batch B3). + + Called at the start of /api/chat/plan + /api/chat/execute. If the + persisted session is over 70 % of the active model's context + window, fold the older messages into a single summary entry and + record a Task row so the user sees what happened. A failure here + must never block the agent run. + """ + if not session_id: + return + try: + from .auto_compact import maybe_compact_session + from .context_meter import resolve_context_window + from .task_recorder import begin_task, finish_task + + s = get_settings() + window = resolve_context_window(s) + report = maybe_compact_session( + _session_mgr, session_id, context_window=window + ) + if report.compacted: + # Surface the compaction in the right-sidebar trace so the + # operator can see "Conversation summarised 24 → 1" rather + # than wonder where their messages went. + task = begin_task( + _session_mgr, session_id, + kind="compact", + title=( + f"Compacted: {report.messages_folded} older messages " + f"({report.before_tokens} → {report.after_tokens} tokens)" + ), + ) + finish_task( + _session_mgr, session_id, task, + status="completed", + prompt_tokens=report.after_tokens, + ) + except Exception as exc: # pragma: no cover - defensive + logger.debug("[compact] hook failed: %s", exc) + + @app.post("/api/chat/plan") +@_track_task(kind="plan", title_fn=lambda req: req.goal) async def api_chat_plan(req: ChatPlanRequest, authorization: Optional[str] = Header(None)): + _maybe_compact_session_for_request(req.session_id) token = get_github_token(authorization) logger.info( @@ -1230,8 +1396,57 @@ async def api_chat_plan(req: ChatPlanRequest, authorization: Optional[str] = Hea # Use lite planner when Lite Mode is active (setting OR topology) planner = generate_plan_lite if _is_lite_mode_active() else generate_plan + # Batch B9 — deterministic query router. Runs BEFORE the LLM + # so even small models that pick poorly without guidance see + # a strategy hint up front. Best-effort: any failure falls + # back to today's no-hint behaviour rather than 500-ing. + routing_hint = None + routing_intent: Optional[str] = None + try: + from . import flags as _flags + if _flags.is_on("query_router", default=True): + from .query_router import classify, render_planner_hint + from .rag_consent import has_consent + + # Cheap path: a flat list of repo files for the + # classifier's path-verification step. Failure is + # tolerated — router falls back to "no targets". + repo_paths: list[str] = [] + try: + from .github_api import get_repo_tree + _tree = await get_repo_tree( + req.repo_owner, req.repo_name, + token=token, ref=req.branch_name, + ) + repo_paths = [t["path"] for t in (_tree or []) if t.get("path")] + except Exception: + pass + + rag_index_present = ( + has_consent(req.repo_owner, req.repo_name) + ) + + decision = classify( + req.goal, + repo_files=repo_paths, + rag_index_exists=rag_index_present, + force_no_rag=bool(req.force_no_rag), + ) + routing_hint = render_planner_hint(decision) + routing_intent = decision.intent + logger.info("[router] %s", decision.rationale) + except Exception as _route_err: # pragma: no cover - defensive + logger.debug("[router] skipped: %s", _route_err) + routing_hint = None + routing_intent = None + try: - plan = await planner(req.goal, full_name, token=token, branch_name=req.branch_name) + plan = await planner( + req.goal, full_name, + token=token, branch_name=req.branch_name, + routing_hint=routing_hint, + intent=routing_intent, + ) return plan except Exception as exc: error_msg = str(exc) @@ -1308,6 +1523,8 @@ async def api_chat_plan(req: ChatPlanRequest, authorization: Optional[str] = Hea full_name, token=token, branch_name=req.branch_name, + routing_hint=routing_hint, + intent=routing_intent, ) except Exception as lite_exc: logger.exception( @@ -1343,10 +1560,15 @@ async def api_chat_plan(req: ChatPlanRequest, authorization: Optional[str] = Hea @app.post("/api/chat/execute") +@_track_task( + kind="execute", + title_fn=lambda req: getattr(getattr(req, "plan", None), "goal", None) or "Execute plan", +) async def api_chat_execute( req: ExecutePlanRequest, authorization: Optional[str] = Header(None) ): + _maybe_compact_session_for_request(req.session_id) token = get_github_token(authorization) with execution_context(token, ref=req.branch_name): @@ -1408,6 +1630,39 @@ async def api_chat_execute( "mode", "sticky" if req.branch_name else "hard-switch", ) + + # Persist the branch the executor actually wrote to onto the + # session record so reopening this session jumps back to that + # branch (instead of the master/default it was created on). + # Best-effort: a failure to update the session must never block + # the user-facing execute result. + new_branch = ( + result.get("branch") if isinstance(result, dict) else None + ) or req.branch_name + if req.session_id and new_branch: + try: + session = _session_mgr.load(req.session_id) + session.branch = new_branch + # Multi-repo support: update the matching repos[] entry + # too if it exists, so callers that read from there see + # a consistent value. + if session.repos: + for entry in session.repos: + if entry.get("full_name") == full_name: + entry["branch"] = new_branch + _session_mgr.save(session) + except FileNotFoundError: + logger.debug( + "[exec] session %s not found — skipping branch persist", + req.session_id, + ) + except Exception as exc: # pragma: no cover - defensive + logger.warning( + "[exec] could not persist branch on session %s: %s", + req.session_id, + exc, + ) + return result @@ -3017,6 +3272,45 @@ async def api_get_session_messages(session_id: str): } +@app.get("/api/sessions/{session_id}/tasks") +async def api_get_session_tasks(session_id: str): + """Return the right-sidebar Tasks trace for one session. + + Read-only. Gated behind the ``tasks_sidebar`` flag — when off the + endpoint 404s so an old frontend can detect "feature absent" with + the same code path it uses for "session deleted". + """ + from . import flags + from .task_recorder import FLAG_TASKS_SIDEBAR + + if not flags.is_on(FLAG_TASKS_SIDEBAR, default=True): + raise HTTPException(status_code=404, detail="Tasks sidebar is disabled") + + try: + session = _session_mgr.load(session_id) + except FileNotFoundError: + raise HTTPException(status_code=404, detail="Session not found") + + return { + "session_id": session.id, + "tasks": [ + { + "id": t.id, + "kind": t.kind, + "title": t.title, + "status": t.status, + "started_at": t.started_at, + "completed_at": t.completed_at, + "duration_ms": t.duration_ms, + "prompt_tokens": t.prompt_tokens, + "completion_tokens": t.completion_tokens, + "error": t.error, + } + for t in session.tasks + ], + } + + @app.get("/api/sessions/{session_id}/diff") async def api_get_session_diff(session_id: str): """Get diff stats for a session (placeholder for sandbox integration).""" diff --git a/gitpilot/auto_compact.py b/gitpilot/auto_compact.py new file mode 100644 index 0000000..c434909 --- /dev/null +++ b/gitpilot/auto_compact.py @@ -0,0 +1,217 @@ +"""Auto-compaction of chat session history (Batch B3). + +When a session's persisted conversation crosses 70 % of the active +model's context window, we fold the older non-essential messages into +a single summary entry — same strategy Claude Code, Cursor and +Continue use to keep sessions usable across many turns without +silently truncating mid-stream. + +Design notes: + +* **Pure Python.** We reuse :mod:`gitpilot.context_budget`'s + deterministic ``_default_summariser`` so compaction never depends + on a live LLM call. Production deployments can later inject a + smarter summariser without changing this module's interface. +* **Append-only audit trail.** Each compaction also lands as a + ``kind="compact"`` Task in the right-sidebar trace so the user can + see "Conversation summarised: 24 messages → 1 summary". +* **Idempotent.** We tag the summary message with + ``metadata["compacted"] = "1"`` so a no-op pass over already-compact + history doesn't repeatedly fold the same content. +* **Best-effort.** Failure to load or save the session must never + block the user-facing endpoint — log and proceed. The chat + continues to work; it just won't shrink this turn. + +Wired in at the API boundary in :mod:`gitpilot.api` (``/api/chat/plan`` ++ ``/api/chat/execute``), so agentic.py is untouched. +""" +from __future__ import annotations + +import logging +from dataclasses import dataclass +from typing import Optional + +from . import flags +from .context_budget import ( + BudgetPolicy, + Message as BudgetMessage, + _default_summariser, + estimate_tokens, +) +from .session import Message as SessionMessage, Session, SessionManager + +logger = logging.getLogger(__name__) + +FLAG_AUTO_COMPACT = "auto_compact" + +# Tunable knobs. Centralised so future tuning (or per-provider +# overrides) doesn't require code changes in two places. +DEFAULT_CONDENSE_AT_RATIO = 0.70 # fire at 70 % of window +DEFAULT_KEEP_RECENT_TURNS = 6 # last N messages always preserved +DEFAULT_RESERVED_RESPONSE = 4_096 # mirror context_meter constant +COMPACTED_FLAG = "compacted" +SUMMARY_LABEL = "Conversation summary (older turns condensed)" + + +@dataclass +class CompactionReport: + """Returned by :func:`maybe_compact_session` so the caller can log + a Task entry with concrete before/after numbers.""" + compacted: bool = False + before_tokens: int = 0 + after_tokens: int = 0 + messages_folded: int = 0 + reason: Optional[str] = None # human-readable explanation + + +# ---------------------------------------------------------------------- +# Internal helpers +# ---------------------------------------------------------------------- + +def _budget_messages_from_session(session: Session) -> list[BudgetMessage]: + """Bridge SessionMessage → BudgetMessage.""" + out: list[BudgetMessage] = [] + for m in session.messages: + role = m.role if m.role in ("user", "assistant", "system", "tool") else "user" + importance = "pinned" if (m.metadata or {}).get(COMPACTED_FLAG) == "1" else "normal" + # Best-effort role narrowing — BudgetMessage role is a Literal. + out.append( + BudgetMessage( + role=role, # type: ignore[arg-type] + content=m.content or "", + importance=importance, # type: ignore[arg-type] + ) + ) + return out + + +def _session_total_tokens(session: Session) -> int: + return sum(estimate_tokens(m.content or "") for m in session.messages) + + +# ---------------------------------------------------------------------- +# Public entry point +# ---------------------------------------------------------------------- + +def maybe_compact_session( + session_mgr: SessionManager, + session_id: Optional[str], + *, + context_window: int, + reserved_response: int = DEFAULT_RESERVED_RESPONSE, + condense_at_ratio: float = DEFAULT_CONDENSE_AT_RATIO, + keep_recent_turns: int = DEFAULT_KEEP_RECENT_TURNS, +) -> CompactionReport: + """Condense the session's history if it's crossed the threshold. + + Returns a :class:`CompactionReport` so the caller can record a + Task entry with the concrete numbers. A no-op report + (``compacted=False``) is returned silently when: + + * the feature flag is off, + * no session id was supplied, + * the session can't be loaded, + * we're below the threshold, + * there are not enough non-recent messages to fold. + """ + if not flags.is_on(FLAG_AUTO_COMPACT, default=True): + return CompactionReport(reason="flag off") + if not session_id: + return CompactionReport(reason="no session id") + if context_window <= 0: + return CompactionReport(reason="unknown context window") + + try: + session = session_mgr.load(session_id) + except Exception as exc: + logger.debug("[compact] session %s not loadable: %s", session_id, exc) + return CompactionReport(reason="session not loadable") + + before = _session_total_tokens(session) + # The user's *effective* budget excludes the reserved response + # headroom — that's the budget we actually need to keep below. + effective_window = max(0, context_window - reserved_response) + threshold = int(effective_window * condense_at_ratio) + if before < threshold: + return CompactionReport( + compacted=False, + before_tokens=before, + after_tokens=before, + reason="below threshold", + ) + + # Fold using the existing deterministic summariser. We keep: + # - any message already marked compacted (pinned) + # - the last ``keep_recent_turns`` messages + # Everything else gets summarised into one system message. + msgs = session.messages + if len(msgs) <= keep_recent_turns + 1: + return CompactionReport( + compacted=False, + before_tokens=before, + after_tokens=before, + reason="not enough history to fold", + ) + + pinned = [m for m in msgs if (m.metadata or {}).get(COMPACTED_FLAG) == "1"] + rest = [m for m in msgs if (m.metadata or {}).get(COMPACTED_FLAG) != "1"] + keep_n = max(0, keep_recent_turns) + foldable = rest[:-keep_n] if keep_n else rest + kept = rest[-keep_n:] if keep_n else [] + + if not foldable: + return CompactionReport( + compacted=False, + before_tokens=before, + after_tokens=before, + reason="nothing foldable", + ) + + # Use BudgetMessage objects for the summariser — the existing + # summariser was written against that shape. + budget_foldable = [ + BudgetMessage( + role=(m.role if m.role in ("user", "assistant", "system", "tool") else "user"), # type: ignore[arg-type] + content=m.content or "", + ) + for m in foldable + ] + summary_body = _default_summariser(budget_foldable) + summary_msg = SessionMessage( + role="system", + content=f"## {SUMMARY_LABEL}\n\n{summary_body}", + metadata={COMPACTED_FLAG: "1"}, + ) + + session.messages = pinned + [summary_msg] + kept + after = _session_total_tokens(session) + + try: + session_mgr.save(session) + except Exception as exc: # pragma: no cover - defensive + logger.warning("[compact] could not save session %s: %s", session_id, exc) + return CompactionReport( + compacted=False, + before_tokens=before, + after_tokens=after, + reason=f"save failed: {exc}", + ) + + return CompactionReport( + compacted=True, + before_tokens=before, + after_tokens=after, + messages_folded=len(foldable), + reason=f"folded {len(foldable)} older messages", + ) + + +__all__ = [ + "FLAG_AUTO_COMPACT", + "CompactionReport", + "DEFAULT_CONDENSE_AT_RATIO", + "DEFAULT_KEEP_RECENT_TURNS", + "DEFAULT_RESERVED_RESPONSE", + "SUMMARY_LABEL", + "maybe_compact_session", +] diff --git a/gitpilot/edit_backend.py b/gitpilot/edit_backend.py new file mode 100644 index 0000000..fde0c8f --- /dev/null +++ b/gitpilot/edit_backend.py @@ -0,0 +1,317 @@ +"""Surgical edit operations for the executor (Batch B8). + +Pure text-in / text-out functions — no GitHub / disk I/O. The agent +tool wrappers in :mod:`gitpilot.agent_tools` are responsible for +fetching the current file bytes (GitHub mode) or reading from disk +(local mode), passing them through these helpers, and then writing +the result back. + +Two operations: + +* :func:`apply_edit` — exact-string find-and-replace with + *strict occurrence validation*. The model passes a small + ``old_string`` and a small ``new_string``; we refuse to apply + unless ``old_string`` occurs exactly the expected number of times. + Inspired by Claude Code's ``Edit`` tool — the contract that makes + fixing line 1 482 of a 2 000-line file reliable across any model. + +* :func:`apply_unified_diff` — parse a minimal subset of unified + diff and apply it by *matching the leading context lines* rather + than trusting the line numbers in the hunk header. Line numbers + drift the moment another edit lands; context survives. This is + the same trick Codex's ``apply_patch`` uses internally. + +Both functions raise :class:`EditError` with a precise, actionable +message rather than silently mis-editing. The executor must surface +that error to the user and refuse to commit. +""" +from __future__ import annotations + +import re +from dataclasses import dataclass +from typing import List, Optional, Sequence, Tuple + + +class EditError(ValueError): + """Raised when an edit cannot be applied safely. + + The message is user-facing — keep it concrete: file path, what + failed, what the caller can do about it. Never log a stack trace + in place of a clear sentence. + """ + + +@dataclass(frozen=True) +class EditReport: + """Returned alongside the new content so callers can record a + Task row with concrete numbers.""" + occurrences_replaced: int + bytes_before: int + bytes_after: int + + +# ---------------------------------------------------------------------- +# apply_edit — exact-string find-and-replace +# ---------------------------------------------------------------------- + +def apply_edit( + content: str, + *, + old_string: str, + new_string: str, + expected_occurrences: int = 1, +) -> Tuple[str, EditReport]: + """Replace ``old_string`` with ``new_string`` in ``content``. + + ``expected_occurrences`` is a *contract*: we will only apply the + edit when ``old_string`` appears in ``content`` exactly this many + times. Any deviation raises :class:`EditError` — agents must + disambiguate by widening ``old_string`` or specifying the right + count. + + Pass ``expected_occurrences=-1`` to allow any positive number of + matches; useful for "rename this identifier everywhere". + + The function is pure: same inputs → same outputs, no I/O. + """ + if old_string is None: + raise EditError("apply_edit: old_string is required") + if new_string is None: + raise EditError("apply_edit: new_string is required (use empty string to delete)") + if old_string == new_string: + raise EditError( + "apply_edit: old_string and new_string are identical — " + "no edit would be applied. This is almost always a bug " + "in the planner; refuse rather than commit a no-op." + ) + + # Count occurrences without regex — old_string is treated as + # literal text, including whitespace and newlines. + if old_string == "": + raise EditError("apply_edit: old_string must not be empty") + + n = content.count(old_string) + if n == 0: + # Provide a short hint about why nothing matched: indentation + # mismatch is by far the most common cause on Python files. + hint = "" + if old_string.strip() and old_string.strip() in content: + hint = ( + " Hint: a stripped form of old_string IS present — " + "the indentation in your edit does not match the file. " + "Re-read the surrounding lines and copy them exactly." + ) + raise EditError( + "apply_edit: old_string was not found in the file." + hint + ) + + if expected_occurrences == -1: + # "replace all" mode — at least one match suffices. + pass + elif n != expected_occurrences: + raise EditError( + f"apply_edit: old_string occurs {n} time(s) in the file, " + f"but expected_occurrences was {expected_occurrences}. " + "Widen old_string to include more surrounding context, or " + "set expected_occurrences to the correct number." + ) + + new_content = content.replace(old_string, new_string) + return new_content, EditReport( + occurrences_replaced=n, + bytes_before=len(content), + bytes_after=len(new_content), + ) + + +# ---------------------------------------------------------------------- +# apply_unified_diff — minimal patch parser + context-match applier +# ---------------------------------------------------------------------- + +_HUNK_HEADER_RE = re.compile( + r"^@@\s+-(?P\d+)(?:,(?P\d+))?\s+" + r"\+(?P\d+)(?:,(?P\d+))?\s+@@" +) + + +@dataclass +class _Hunk: + """One hunk extracted from a unified diff.""" + old_start: int # 1-indexed line number from the @@ header + new_start: int + lines: List[str] # raw lines including the leading char + + +def _parse_unified_diff(diff: str) -> List[_Hunk]: + """Tolerant parser — accepts diffs with or without file headers + (``--- a/file`` / ``+++ b/file``). We only care about the hunks. + """ + hunks: List[_Hunk] = [] + current: Optional[_Hunk] = None + for raw in diff.splitlines(): + m = _HUNK_HEADER_RE.match(raw) + if m: + if current is not None: + hunks.append(current) + current = _Hunk( + old_start=int(m.group("old_start")), + new_start=int(m.group("new_start")), + lines=[], + ) + continue + if current is None: + # Pre-hunk preamble (--- / +++ / diff --git) — ignore. + continue + if not raw: + # An empty line inside a hunk represents a context blank + # line (some tools emit a bare "\n" with no leading space). + current.lines.append(" ") + continue + prefix = raw[0] + if prefix in (" ", "+", "-"): + current.lines.append(raw) + elif prefix == "\\": + # "\ No newline at end of file" — silently skip. + continue + else: + # Foreign line inside a hunk — fail loudly so we never + # silently corrupt the file. + raise EditError( + f"apply_unified_diff: malformed hunk line: {raw!r}. " + "Lines must start with ' ', '+' or '-'." + ) + if current is not None: + hunks.append(current) + if not hunks: + raise EditError( + "apply_unified_diff: no @@ hunks found. The diff appears empty " + "or only contains file headers." + ) + return hunks + + +def _hunk_old_block(hunk: _Hunk) -> List[str]: + """Return the contiguous list of pre-edit lines (the ones with + ``' '`` or ``'-'`` prefix). These are what we match against.""" + out: List[str] = [] + for ln in hunk.lines: + if not ln: + out.append("") + continue + if ln[0] in (" ", "-"): + out.append(ln[1:]) + return out + + +def _hunk_new_block(hunk: _Hunk) -> List[str]: + """Return the contiguous list of post-edit lines (``' '`` or + ``'+'`` prefix).""" + out: List[str] = [] + for ln in hunk.lines: + if not ln: + out.append("") + continue + if ln[0] in (" ", "+"): + out.append(ln[1:]) + return out + + +def _find_block(haystack: Sequence[str], needle: Sequence[str], near: int) -> int: + """Locate ``needle`` inside ``haystack`` as a contiguous slice. + + Returns the 0-indexed start position. Prefers a match near + ``near`` (1-indexed line from the hunk header, translated by the + caller) so when the file has several identical blocks the patch + lands close to where it was authored. Raises if no exact match. + """ + if not needle: + raise EditError("apply_unified_diff: empty hunk") + matches: List[int] = [] + for i in range(0, len(haystack) - len(needle) + 1): + if list(haystack[i : i + len(needle)]) == list(needle): + matches.append(i) + if not matches: + raise EditError( + "apply_unified_diff: could not locate the hunk's context in " + "the file — the surrounding lines have drifted. Re-read the " + "file and regenerate the diff." + ) + if len(matches) == 1: + return matches[0] + # Multiple identical blocks — pick the one nearest to the hunk header. + target = max(0, near - 1) + return min(matches, key=lambda pos: abs(pos - target)) + + +def apply_unified_diff(content: str, diff: str) -> Tuple[str, EditReport]: + """Apply a unified diff to ``content`` by matching context lines + rather than trusting hunk line numbers. + + Limitations (documented, intentional): + + * Single-file diffs only. If ``diff`` looks like a multi-file + patch (``diff --git`` separator with more than one file), the + caller must split it. + * No fuzz matching. Context must match byte-for-byte. Drift + caused by another concurrent edit raises :class:`EditError` + with an actionable message rather than silently mis-editing. + """ + if diff is None or not diff.strip(): + raise EditError("apply_unified_diff: diff is empty") + + # Detect multi-file diffs BEFORE parsing so the parser doesn't + # trip on the second file's ``diff --git`` header. + gits = diff.count("\ndiff --git ") + (1 if diff.startswith("diff --git ") else 0) + if gits > 1: + raise EditError( + "apply_unified_diff: multi-file diff detected; this helper " + "applies to one file at a time. Split the patch first." + ) + + hunks = _parse_unified_diff(diff) + + lines = content.splitlines(keepends=True) + # Drop the keepends so block matching is line-exact; we'll + # reassemble the line endings from the original where possible. + raw_lines = [ln.rstrip("\n") for ln in lines] + # Preserve the original trailing newline state so we don't + # accidentally drop or add one. + had_trailing_newline = content.endswith("\n") + + output: List[str] = list(raw_lines) + total_replacements = 0 + + # Apply hunks in order, tracking a running offset so the second + # hunk's context match accounts for earlier hunks' line-count + # changes. + offset = 0 + for hunk in hunks: + old_block = _hunk_old_block(hunk) + new_block = _hunk_new_block(hunk) + near = hunk.old_start + offset + pos = _find_block(output, old_block, near=near) + output[pos : pos + len(old_block)] = new_block + offset += len(new_block) - len(old_block) + total_replacements += 1 + + new_content = "\n".join(output) + if had_trailing_newline and not new_content.endswith("\n"): + new_content += "\n" + elif not had_trailing_newline and new_content.endswith("\n"): + # Preserve "no newline at end of file" if the original lacked + # one — only when the diff didn't add it. + new_content = new_content.rstrip("\n") + + return new_content, EditReport( + occurrences_replaced=total_replacements, + bytes_before=len(content), + bytes_after=len(new_content), + ) + + +__all__ = [ + "EditError", + "EditReport", + "apply_edit", + "apply_unified_diff", +] diff --git a/gitpilot/explorer_summary.py b/gitpilot/explorer_summary.py new file mode 100644 index 0000000..cdcf310 --- /dev/null +++ b/gitpilot/explorer_summary.py @@ -0,0 +1,286 @@ +"""Explorer-report compression (Batch B5). + +The Repository Explorer agent produces a free-form "REPOSITORY +EXPLORATION REPORT" that grows linearly with the repo size. On a +200-file repo the file-listing section alone can run 4–6 KB — enough +to crowd the planner's prompt on an 8 k-context model like +llama3:8b. + +This module compresses that report into a fixed-budget summary the +planner sees instead of the raw transcript. Strict properties: + +* **Deterministic.** No LLM call needed; the compression is pure + string manipulation. Easy to test, reproducible across runs. +* **Lossless for facts.** Every concrete file path the planner needs + to validate is preserved (file lists, key files, directory tree). + Only the prose padding and redundant repetition is trimmed. +* **Hard-capped.** Default 800 tokens, configurable. When the raw + report is already under cap we emit it unchanged (no churn). +* **Format-stable.** Output is the same "REPOSITORY EXPLORATION + REPORT" header the planner already knows how to read — no prompt + template change in agentic.py beyond passing the compressed string. + +Wired in at the boundary between explorer.kickoff() and the planner +task description, so neither agent changes shape. +""" +from __future__ import annotations + +import logging +import re +from collections import Counter +from dataclasses import dataclass, field +from typing import List, Optional + +from . import flags +from .context_budget import estimate_tokens + +logger = logging.getLogger(__name__) + +FLAG_SUBAGENT_EXPLORER = "subagent_explorer" + +# Tunable budgets. Centralised so a future per-provider override can +# tighten them on small-context models without touching call sites. +DEFAULT_TOKEN_BUDGET = 800 # planner-injection cap +MAX_FILES_LISTED = 60 # absolute hard cap on enumerated paths +MAX_KEY_FILES = 8 +MAX_DIRECTORY_LINES = 25 +PROSE_PARAGRAPH_CAP = 280 # chars per free-text paragraph + + +@dataclass +class CompressionReport: + """Returned alongside the compressed string so the caller can land + a Task row showing concrete before/after numbers.""" + original_tokens: int = 0 + compressed_tokens: int = 0 + files_in_original: int = 0 + files_kept: int = 0 + truncated: bool = False + reason: Optional[str] = None + + +@dataclass +class _ParsedReport: + """Best-effort split of the explorer's free-form report into the + sections we actually use. Missing sections default to empty + strings; we never crash on a malformed report.""" + files_found: List[str] = field(default_factory=list) + key_files: List[str] = field(default_factory=list) + directory_structure: str = "" + file_types: str = "" + other_prose: List[str] = field(default_factory=list) + + +# Regexes for the section headers the explorer's prompt template +# instructs it to emit. Case-insensitive on purpose — small models +# sometimes wobble the capitalisation. +_SECTION_RE = re.compile( + r"(?im)^\s*(?Pfiles\s+found|key\s+files|directory\s+structure|file\s+types|repository\s+exploration\s+report)\s*:?\s*$" +) +_BULLET_RE = re.compile(r"^\s*(?:[-*•]|\d+\.)\s+(?P.+?)\s*$") +_PATH_RE = re.compile(r"[\w./\-]+\.(?:md|py|ts|tsx|js|jsx|json|yml|yaml|toml|cfg|ini|txt|rst|sh|bash|go|rs|rb|java|c|h|cpp|hpp|html|css|scss)") + + +def _split_into_sections(report: str) -> _ParsedReport: + """Walk the report top-to-bottom, routing lines into buckets based + on the most-recent section header we've seen. Tolerant: unknown + sections fall through to ``other_prose``.""" + parsed = _ParsedReport() + current = "other" + for raw_line in report.splitlines(): + line = raw_line.rstrip() + if not line.strip(): + continue + m = _SECTION_RE.match(line) + if m: + name = m.group("name").lower().replace(" ", "_") + if "files_found" in name: + current = "files_found" + elif "key_files" in name: + current = "key_files" + elif "directory" in name: + current = "directory" + elif "file_types" in name: + current = "file_types" + else: + current = "other" + continue + + if current == "files_found": + # Pull path-like tokens out of the line (handles "- file.py" + # and "1. file.py" and bare "file.py"). + for path in _PATH_RE.findall(line): + if path not in parsed.files_found: + parsed.files_found.append(path) + # Also catch lines that look like bullets but don't have a + # standard extension — we don't want to drop a "Dockerfile". + bm = _BULLET_RE.match(line) + if bm: + rest = bm.group("rest").strip().strip("`'\"") + if rest and "/" in rest or _looks_like_filename(rest): + if rest not in parsed.files_found: + parsed.files_found.append(rest) + elif current == "key_files": + bm = _BULLET_RE.match(line) + if bm: + rest = bm.group("rest").strip().strip("`'\"") + if rest and rest not in parsed.key_files: + parsed.key_files.append(rest) + else: + # Sometimes the explorer just lists key files inline. + for path in _PATH_RE.findall(line): + if path not in parsed.key_files: + parsed.key_files.append(path) + elif current == "directory": + parsed.directory_structure += line + "\n" + elif current == "file_types": + parsed.file_types += line + "\n" + else: + parsed.other_prose.append(line) + return parsed + + +def _looks_like_filename(s: str) -> bool: + """Heuristic for tokens like ``Dockerfile``, ``Makefile``, ``LICENSE`` + that don't have an extension but are obviously file names.""" + s = s.strip() + if not s or "\n" in s or " " in s: + return False + if s.startswith(".") and len(s) > 1: + return True # .gitignore, .env + bare = {"Dockerfile", "Makefile", "LICENSE", "CHANGELOG", "NOTICE", "AUTHORS"} + return s in bare + + +def _truncate_directory(structure: str) -> str: + """Keep only the first ``MAX_DIRECTORY_LINES`` lines of the + directory tree. Append a marker when trimmed.""" + lines = [ln for ln in structure.splitlines() if ln.strip()] + if len(lines) <= MAX_DIRECTORY_LINES: + return "\n".join(lines) + return "\n".join(lines[:MAX_DIRECTORY_LINES]) + f"\n …{len(lines) - MAX_DIRECTORY_LINES} more entries" + + +def _file_extension_histogram(files: List[str]) -> str: + """When the explorer hasn't produced a 'File Types' section, derive + one from the file list ourselves. Cheap and always-on.""" + counter: Counter[str] = Counter() + for f in files: + if "." in f.split("/")[-1]: + ext = f.rsplit(".", 1)[-1].lower() + counter[ext] += 1 + else: + counter["(no-ext)"] += 1 + if not counter: + return "" + return ", ".join(f"{ext}={n}" for ext, n in counter.most_common(8)) + + +def compress_exploration_report( + report: str, + *, + token_budget: int = DEFAULT_TOKEN_BUDGET, +) -> tuple[str, CompressionReport]: + """Return a fixed-budget compressed form of the explorer's report. + + Always returns a string the planner can read using its existing + template; never raises on malformed input. + + Falls back to the raw report (no compression) when: + * the feature flag is off, OR + * the raw report already fits under the budget. + """ + metrics = CompressionReport( + original_tokens=estimate_tokens(report or ""), + ) + if not flags.is_on(FLAG_SUBAGENT_EXPLORER, default=True): + metrics.compressed_tokens = metrics.original_tokens + metrics.reason = "flag off" + return report, metrics + + if not report or not report.strip(): + metrics.reason = "empty report" + return report, metrics + + if metrics.original_tokens <= token_budget: + metrics.compressed_tokens = metrics.original_tokens + metrics.reason = "under budget" + return report, metrics + + parsed = _split_into_sections(report) + metrics.files_in_original = len(parsed.files_found) + + # File list — cap and annotate if we trimmed. + files = parsed.files_found[:MAX_FILES_LISTED] + truncated_files = len(parsed.files_found) > MAX_FILES_LISTED + metrics.files_kept = len(files) + metrics.truncated = truncated_files + + # Key files — preserve order, cap. + key = parsed.key_files[:MAX_KEY_FILES] + + # Directory structure — cap to first N lines. + directory = _truncate_directory(parsed.directory_structure) + + # File-type histogram — prefer explorer's own; else compute one. + file_types = parsed.file_types.strip() or _file_extension_histogram(parsed.files_found) + + # Assemble the compressed report using the same header the planner + # already expects. + lines: list[str] = ["REPOSITORY EXPLORATION REPORT", "============================="] + lines.append("") + lines.append("Files Found:") + for path in files: + lines.append(f" - {path}") + if truncated_files: + lines.append( + f" …{len(parsed.files_found) - MAX_FILES_LISTED} more files. " + "Use 'Find files matching a pattern' or 'Search file contents' " + "to drill down." + ) + if key: + lines.append("") + lines.append("Key Files:") + for k in key: + lines.append(f" - {k}") + if directory: + lines.append("") + lines.append("Directory Structure:") + lines.append(directory.rstrip()) + if file_types: + lines.append("") + lines.append(f"File Types: {file_types}") + + compressed = "\n".join(lines) + metrics.compressed_tokens = estimate_tokens(compressed) + + # If our compression somehow blew the budget (very pathological + # input), trim from the file list as the last resort. + while metrics.compressed_tokens > token_budget and len(files) > 5: + files = files[: max(5, int(len(files) * 0.75))] + metrics.files_kept = len(files) + rebuilt = [ + "REPOSITORY EXPLORATION REPORT", "=============================", "", + "Files Found:", + *(f" - {p}" for p in files), + f" …{len(parsed.files_found) - len(files)} more files. " + "Use 'Find files matching a pattern' or 'Search file contents' to drill down.", + ] + if key: + rebuilt.extend(["", "Key Files:", *(f" - {k}" for k in key)]) + if directory: + rebuilt.extend(["", "Directory Structure:", directory.rstrip()]) + if file_types: + rebuilt.extend(["", f"File Types: {file_types}"]) + compressed = "\n".join(rebuilt) + metrics.compressed_tokens = estimate_tokens(compressed) + + return compressed, metrics + + +__all__ = [ + "FLAG_SUBAGENT_EXPLORER", + "DEFAULT_TOKEN_BUDGET", + "CompressionReport", + "compress_exploration_report", +] diff --git a/gitpilot/grep_backend.py b/gitpilot/grep_backend.py new file mode 100644 index 0000000..78db1f7 --- /dev/null +++ b/gitpilot/grep_backend.py @@ -0,0 +1,275 @@ +"""Grep backend — pure-Python regex search across repo files. + +Powers the ``Search file contents`` agent tool. Designed for local +on-prem use first; no shell-out, no external dependency. When +``ripgrep`` is present it is used as a fast path; otherwise we fall +back to a hand-written Python loop that's still fast on the typical +GitPilot repo (a few hundred files). + +Contract (pinned by the tests): + +* Returns a list of dicts: ``{path, line, match}``. +* Truncates above ``max_results`` and includes a ``truncated=True`` + flag in the metadata so the caller can refine. +* Result order: stable — files are sorted, lines within a file are + in ascending order. Reproducible runs for tests. + +Security: +* The pattern is a regular expression (validated up front). No + shell injection: we never pass it to a shell when using the rg + binary — only via subprocess args. +* The ``path_pattern`` filter goes through the same glob → regex + translator used by Batch B1, so the same `/`-aware semantics apply. +""" +from __future__ import annotations + +import logging +import re +import shutil +import subprocess +from dataclasses import dataclass, field +from typing import Iterable, List, Optional + +logger = logging.getLogger(__name__) + +# Hard cap — never return more than this regardless of caller value. +GREP_HARD_MAX_RESULTS = 500 +GREP_DEFAULT_MAX_RESULTS = 100 +RIPGREP_TIMEOUT_S = 10 + + +@dataclass +class GrepHit: + path: str + line: int + match: str + + +@dataclass +class GrepResult: + hits: List[GrepHit] = field(default_factory=list) + truncated: bool = False + backend: str = "python" # "ripgrep" | "python" + error: Optional[str] = None + + +# ---------------------------------------------------------------------- +# Public entry point +# ---------------------------------------------------------------------- + +def grep( + files: dict[str, str], + pattern: str, + *, + case_insensitive: bool = False, + max_results: int = GREP_DEFAULT_MAX_RESULTS, + path_filter: Optional[re.Pattern[str]] = None, +) -> GrepResult: + """Run a regex search over the supplied (path → content) mapping. + + The caller is responsible for assembling the file map — for the + GitHub-only path that means downloading the relevant files first; + for the local-checkout path that's just ``Path.read_text`` per + matching file. Keeping the backend file-source-agnostic lets us + test it without touching GitHub or the disk. + """ + cap = max(1, min(GREP_HARD_MAX_RESULTS, int(max_results))) + + try: + flags = re.IGNORECASE if case_insensitive else 0 + rx = re.compile(pattern, flags) + except re.error as exc: + return GrepResult(error=f"invalid regex: {exc}") + + hits: List[GrepHit] = [] + for path in sorted(files.keys()): + if path_filter is not None and not path_filter.match(path): + continue + content = files[path] + if not content: + continue + for lineno, line in enumerate(content.splitlines(), start=1): + if rx.search(line): + hits.append(GrepHit(path=path, line=lineno, match=line.rstrip())) + if len(hits) >= cap: + return GrepResult(hits=hits, truncated=True, backend="python") + + return GrepResult(hits=hits, truncated=False, backend="python") + + +# ---------------------------------------------------------------------- +# Local-checkout fast path: shell out to ripgrep when available +# ---------------------------------------------------------------------- + +def grep_local( + workdir: str, + pattern: str, + *, + case_insensitive: bool = False, + max_results: int = GREP_DEFAULT_MAX_RESULTS, + glob_filter: Optional[str] = None, +) -> GrepResult: + """Search files under ``workdir`` using ripgrep if available, + falling back to a pure-Python walk otherwise. + + Used for the local-checkout / local-git modes. GitHub-only + sessions go through :func:`grep` instead because they don't have + a tree on disk. + """ + if shutil.which("rg"): + return _grep_via_ripgrep( + workdir, + pattern, + case_insensitive=case_insensitive, + max_results=max_results, + glob_filter=glob_filter, + ) + # Pure-Python fallback — walk the tree, read each file, match. + # Kept here (rather than in the GitHub helper) because the local + # path benefits from a file-handle-streaming walk that doesn't + # materialise the whole repo into memory. + return _grep_via_python_walk( + workdir, + pattern, + case_insensitive=case_insensitive, + max_results=max_results, + glob_filter=glob_filter, + ) + + +def _grep_via_ripgrep( + workdir: str, + pattern: str, + *, + case_insensitive: bool, + max_results: int, + glob_filter: Optional[str], +) -> GrepResult: + cap = max(1, min(GREP_HARD_MAX_RESULTS, int(max_results))) + argv = [ + "rg", + "--no-config", # ignore user's ~/.ripgreprc + "--no-heading", + "--line-number", + "--with-filename", + "--color", "never", + "--max-count", str(cap), + # Skip binaries — saves token-pollution and matches what + # Claude Code / Cursor do by default. + "--text", + ] + if case_insensitive: + argv.append("-i") + if glob_filter: + argv.extend(["-g", glob_filter]) + argv.extend(["--", pattern, workdir]) + + try: + proc = subprocess.run( + argv, + capture_output=True, + text=True, + timeout=RIPGREP_TIMEOUT_S, + check=False, + ) + except subprocess.TimeoutExpired: + return GrepResult( + error=f"ripgrep timed out after {RIPGREP_TIMEOUT_S}s", + backend="ripgrep", + ) + except FileNotFoundError: + # rg disappeared between which() and run() — degrade gracefully. + return _grep_via_python_walk( + workdir, pattern, + case_insensitive=case_insensitive, + max_results=max_results, + glob_filter=glob_filter, + ) + + # rg exits 1 when there are zero matches — that's not an error. + if proc.returncode not in (0, 1): + err = proc.stderr.strip().splitlines() + return GrepResult(error="; ".join(err[:3]) if err else "ripgrep failed", backend="ripgrep") + + hits: List[GrepHit] = [] + truncated = False + for raw in proc.stdout.splitlines(): + # Format: :: + parts = raw.split(":", 2) + if len(parts) < 3: + continue + path, lineno_s, match = parts + try: + lineno = int(lineno_s) + except ValueError: + continue + # Trim the leading workdir prefix so paths look repo-relative. + if path.startswith(workdir + "/"): + path = path[len(workdir) + 1:] + hits.append(GrepHit(path=path, line=lineno, match=match.rstrip())) + if len(hits) >= cap: + truncated = True + break + return GrepResult(hits=hits, truncated=truncated, backend="ripgrep") + + +def _grep_via_python_walk( + workdir: str, + pattern: str, + *, + case_insensitive: bool, + max_results: int, + glob_filter: Optional[str], +) -> GrepResult: + import pathlib + + try: + flags = re.IGNORECASE if case_insensitive else 0 + rx = re.compile(pattern, flags) + except re.error as exc: + return GrepResult(error=f"invalid regex: {exc}") + + # Local import to avoid a hard module-load dep when grep isn't used. + from .agent_tools import _glob_to_regex + + pf = _glob_to_regex(glob_filter) if glob_filter else None + cap = max(1, min(GREP_HARD_MAX_RESULTS, int(max_results))) + root = pathlib.Path(workdir) + hits: List[GrepHit] = [] + # Walk deterministically so tests are reproducible. + for path in sorted(root.rglob("*")): + if not path.is_file(): + continue + rel = path.relative_to(root).as_posix() + if pf is not None and not pf.match(rel): + continue + try: + content = path.read_text(encoding="utf-8", errors="replace") + except (OSError, UnicodeDecodeError): + continue + for lineno, line in enumerate(content.splitlines(), start=1): + if rx.search(line): + hits.append(GrepHit(path=rel, line=lineno, match=line.rstrip())) + if len(hits) >= cap: + return GrepResult(hits=hits, truncated=True, backend="python") + return GrepResult(hits=hits, truncated=False, backend="python") + + +# ---------------------------------------------------------------------- +# Formatter for the agent tool wrapper +# ---------------------------------------------------------------------- + +def format_result(result: GrepResult, *, pattern: str) -> str: + if result.error: + return f"Error: {result.error}" + if not result.hits: + return f"No matches for pattern: {pattern}" + lines = [f"Found {len(result.hits)} match(es) for: {pattern}"] + for hit in result.hits: + lines.append(f" {hit.path}:{hit.line}: {hit.match[:200]}") + if result.truncated: + lines.append( + f"…truncated at {len(result.hits)} hits. " + "Narrow the pattern or pass max_results to see more." + ) + return "\n".join(lines) diff --git a/gitpilot/local_tools.py b/gitpilot/local_tools.py index 05b18d2..476b0cd 100644 --- a/gitpilot/local_tools.py +++ b/gitpilot/local_tools.py @@ -165,11 +165,27 @@ def git_log(count: str = "10") -> str: def run_command(command: str, timeout: str = "120") -> str: """Run a shell command in the workspace directory. Returns stdout, stderr, and exit code. - Examples: 'npm test', 'python -m pytest', 'make build', 'ls -la'.""" + Examples: 'npm test', 'python -m pytest', 'make build', 'ls -la'. + + When the user has selected a non-local sandbox in Settings (e.g. + MatrixLab), this tool transparently delegates to that backend so + the agent's autonomous build/test loop runs in the same isolation + the chat UI's Run button uses. With the default ``subprocess`` + backend the call still goes through :class:`SubprocessSandbox`, + which jails cwd to the workspace and scrubs secrets — strictly + stronger than the previous host-direct path.""" ws = _require_workspace() + timeout_int = _coerce_timeout(timeout) + try: + # Prefer the configured sandbox. Falls back to the legacy + # TerminalSession path only on import errors so an environment + # without httpx still runs the agent (existing behaviour). + return _run_via_sandbox(command, timeout_int, ws.path) + except _SandboxFallback: + pass try: session = TerminalSession(workspace_path=ws.path) - result = _run_async(_executor.execute(session, command, int(timeout))) + result = _run_async(_executor.execute(session, command, timeout_int)) output = f"Exit code: {result.exit_code}\n" if result.stdout: output += f"--- stdout ---\n{result.stdout}\n" @@ -186,6 +202,199 @@ def run_command(command: str, timeout: str = "120") -> str: return f"Error: {e}" +@tool("Run code in sandbox") +def run_in_sandbox(language: str, code: str, timeout: str = "120") -> str: + """Execute a self-contained code snippet in the configured sandbox. + + Use this when you want to verify that code you produced *actually + works* before handing it back to the user — write the snippet, + call this tool, read the captured stdout / stderr / exit code, + and iterate. The snippet runs in an ephemeral tempdir (not the + workspace), so file-system side effects don't pollute the repo. + + Supported languages: python, javascript (or js/node), bash (or + sh/shell). Returns a single text block with the exit code, + stdout, stderr, duration, and backend (Local subprocess / + MatrixLab) so you can tell which sandbox executed the snippet. + + Error retrieval is the point of this tool: when the snippet + fails, the full stderr trace comes back verbatim — the agent + should read it, decide how to fix the bug, and re-run.""" + timeout_int = _coerce_timeout(timeout) + try: + return _run_snippet_via_sandbox(language, code, timeout_int) + except _SandboxFallback as exc: + return f"Error: sandbox unavailable ({exc}); cannot run snippet." + + +# --------------------------------------------------------------------- +# Sandbox helpers +# --------------------------------------------------------------------- + +class _SandboxFallback(Exception): + """Raised when the sandbox path is unusable and the caller should + fall back to the legacy TerminalSession executor.""" + + +def _coerce_timeout(value: object) -> int: + try: + n = int(str(value)) + except (TypeError, ValueError): + return 120 + if n <= 0: + return 120 + return min(n, 600) + + +def _format_sandbox_output(result, label: str) -> str: + """Render a SandboxResult / SandboxRunResponse-shaped object as the + same text block the agent has been reading from ``run_command``, + so existing prompt parsing keeps working — just with the backend + line appended so the agent (and the user reading the trace) can + see which sandbox ran the command.""" + backend = getattr(result, "backend", None) or "subprocess" + pretty = { + "subprocess": "local subprocess", + "matrixlab": "MatrixLab", + "off": "pass-through (host)", + }.get(backend, backend) + lines = [f"Sandbox: {pretty}", f"Command: {label}", f"Exit code: {result.exit_code}"] + if getattr(result, "duration_ms", None) is not None: + lines.append(f"Duration: {result.duration_ms} ms") + if result.stdout: + lines.append("--- stdout ---") + lines.append(result.stdout) + if result.stderr: + lines.append("--- stderr ---") + lines.append(result.stderr) + if getattr(result, "timed_out", False): + lines.append("WARNING: Command timed out") + if getattr(result, "truncated", False): + lines.append("WARNING: Output was truncated") + sbid = getattr(result, "sandbox_id", None) + if sbid: + lines.append(f"sandbox_id: {sbid}") + return "\n".join(lines) + "\n" + + +def _run_via_sandbox(command: str, timeout: int, workspace_path) -> str: + """Route ``run_command`` through the configured sandbox backend. + + Raises :class:`_SandboxFallback` so the caller can drop to the + legacy TerminalSession path if the sandbox can't be constructed + (e.g. httpx missing in a stripped runtime).""" + try: + from pathlib import Path + + from .sandbox import ( + BACKEND_MATRIXLAB, + BACKEND_OFF, + BACKEND_SUBPROCESS, + MatrixLabSandbox, + NullSandbox, + SandboxPolicy, + SandboxRunError, + SandboxUnavailableError, + SubprocessSandbox, + ) + from .settings import get_settings + except Exception as exc: # noqa: BLE001 + raise _SandboxFallback(str(exc)) from exc + + cfg = get_settings().sandbox + backend = (cfg.backend or BACKEND_SUBPROCESS).strip().lower() + + # MatrixLab's /repo/run endpoint requires a real ``repo_url`` — + # that's the contract for cloning + running CI against a remote + # repo, not for arbitrary in-workspace shell commands. Route + # workspace commands through the snippet path instead (POST + # /api/sandbox/run with language=bash), which already dispatches + # to MatrixLab /code/run. Keeps the agent's run_command working + # whichever backend the user picked. + if backend == BACKEND_MATRIXLAB: + return _run_snippet_via_sandbox("bash", command, timeout) + + policy = SandboxPolicy( + workspace=Path(workspace_path), + timeout_sec=timeout, + allow_network=cfg.allow_network, + image=cfg.matrixlab_image or None, + ) + if backend == BACKEND_OFF: + sb = NullSandbox(policy) + else: + sb = SubprocessSandbox(policy) + + # Run + close in a SINGLE event loop. MatrixLabSandbox would + # lazily build an httpx.AsyncClient on first use; closing it in a + # different loop than it was created in is the textbook asyncio + # antipattern (RuntimeError: Event loop is closed). Two separate + # asyncio.run() calls would do exactly that. (For the + # subprocess/null path the close is a no-op, but keeping the + # pattern uniform means future backends can rely on it.) + async def _run_and_close(): + try: + return await sb.run(command, timeout=timeout) + finally: + aclose = getattr(sb, "aclose", None) + if aclose is not None: + try: + await aclose() + except Exception: # noqa: BLE001 + pass + try: + result = _run_async(_run_and_close()) + except SandboxUnavailableError as exc: + return f"Error: sandbox backend {backend!r} unreachable: {exc}\n" + except SandboxRunError as exc: + return f"Error: sandbox backend {backend!r} reported an error: {exc}\n" + except PermissionError as exc: + return f"Permission denied by sandbox policy: {exc}\n" + return _format_sandbox_output(result, command) + + +def _run_snippet_via_sandbox(language: str, code: str, timeout: int) -> str: + """Execute a fenced snippet by POSTing to GitPilot's own + /api/sandbox/run endpoint so the agent and the chat UI share one + code path. Going via the HTTP surface (rather than reaching into + sandbox_api internals) keeps the lifecycle / cleanup behaviour + identical between the two callers.""" + try: + import os + + import httpx + except Exception as exc: # noqa: BLE001 + raise _SandboxFallback(str(exc)) from exc + + port = os.environ.get("GITPILOT_PORT") or "8765" + base = os.environ.get("GITPILOT_INTERNAL_URL") or f"http://127.0.0.1:{port}" + body = {"language": language, "code": code, "timeout_sec": timeout} + try: + with httpx.Client(timeout=timeout + 10) as client: + resp = client.post(f"{base}/api/sandbox/run", json=body) + except httpx.HTTPError as exc: + return f"Error: could not reach the in-process sandbox API: {exc}\n" + if resp.status_code >= 400: + try: + detail = resp.json().get("detail", resp.text) + except Exception: # noqa: BLE001 + detail = resp.text + return f"Sandbox error ({resp.status_code}): {detail}\n" + data = resp.json() + + class _R: + backend = data.get("backend") + exit_code = data.get("exit_code") + stdout = data.get("stdout", "") + stderr = data.get("stderr", "") + duration_ms = data.get("duration_ms") + timed_out = data.get("timed_out", False) + truncated = data.get("truncated", False) + sandbox_id = data.get("sandbox_id") + + return _format_sandbox_output(_R(), f"{language} ") + + # ----------------------------------------------------------------------- # Exports # ----------------------------------------------------------------------- @@ -207,6 +416,7 @@ def run_command(command: str, timeout: str = "120") -> str: LOCAL_SHELL_TOOLS = [ run_command, + run_in_sandbox, ] LOCAL_TOOLS = LOCAL_FILE_TOOLS + LOCAL_GIT_TOOLS + LOCAL_SHELL_TOOLS diff --git a/gitpilot/query_router.py b/gitpilot/query_router.py new file mode 100644 index 0000000..b78f431 --- /dev/null +++ b/gitpilot/query_router.py @@ -0,0 +1,456 @@ +"""Deterministic query router (Batch B9). + +Classifies a user goal into one of a handful of *intents* (fix / +find / info / create / delete / modify), extracts any files the user +mentioned, and emits a strategy hint the planner can either follow +or override. + +Pure Python, no LLM call. The point is that **small local models +(llama3:8b)** sometimes fail to pick the right tool even with rich +descriptions — a deterministic pre-router keeps them on the rails. +Big models can ignore the hint when their judgment is better than +the heuristic; we treat it as advisory, not constraining. + +Auto-RAG decision: +* The router signals ``auto_index_repo=True`` only when the query + is *fuzzy* (natural-language, no symbol tokens, no path mentions) + AND the repo is big enough to benefit (>= 50 files) AND a RAG + index doesn't already exist. +* When consent has not been granted yet, the API layer turns that + signal into an INDEX plan step (see Batch B9 design). When + consent IS granted, the API layer auto-builds in the background. + +This module returns the *decision*; the API layer translates it +into either a plan step or a background task. +""" +from __future__ import annotations + +import re +from dataclasses import dataclass, field +from typing import List, Literal, Optional, Sequence + +# ---------------------------------------------------------------------- +# Constants — exposed so tests can pin the heuristics +# ---------------------------------------------------------------------- + +INTENT_LITERALS = ( + "fix", + "find", + "info", + "create", + "delete", + "modify", + "unknown", +) + +# Per-intent trigger words, lowercased. Order in this table = priority +# when several intents match (first hit wins, except "fix" beats +# "modify" because every fix is a modify but not every modify is a fix). +_INTENT_TRIGGERS: list[tuple[str, tuple[str, ...]]] = [ + ("fix", ("fix ", "bug", " error", "broken", "doesn't work", + "doesnt work", "crash", "traceback", "exception", + "fails", "failing", "regression")), + ("delete", ("delete ", "remove ", "drop ", "get rid of", + "uninstall", "clean up")), + ("create", ("create ", "add ", "generate ", "new file", + "write a new", "build a", "make a", "scaffold")), + ("modify", ("modify ", "change ", "update ", "rename ", "refactor ", + "replace ", "rewrite ", "convert ", "migrate ", "edit ")), + ("find", ("where ", "find ", "search ", "locate ", "show me ", + "list ", "which file", "look for")), + ("info", ("what is ", "what does ", "explain ", "describe ", + "how does ", "how do ", "tell me ", "summari", + "overview", "what do you think")), +] + +# Repo-file extension whitelist for path extraction — every match +# must end in a "real" extension OR be a well-known extensionless +# file. Keeps the extractor from grabbing words like "asyncio" that +# happen to contain dots in surrounding punctuation. +_PATH_EXTENSIONS = ( + "py", "ts", "tsx", "js", "jsx", "mjs", "cjs", + "go", "rs", "rb", "java", "kt", "scala", "swift", + "c", "h", "cpp", "hpp", "cc", "cxx", + "md", "rst", "txt", "html", "css", "scss", "sass", + "json", "yml", "yaml", "toml", "ini", "cfg", + "sh", "bash", "zsh", "fish", + "sql", "graphql", "proto", +) +_EXTENSIONLESS_KEY_FILES = ( + "Dockerfile", "Makefile", "LICENSE", "CHANGELOG", "NOTICE", + "AUTHORS", "Procfile", "Gemfile", "Rakefile", ".gitignore", + ".env", ".env.example", ".dockerignore", +) + +# Indentation-sensitive extensions — the planner gets a stronger hint +# to use Edit (surgical) rather than Write (regenerate). +_INDENTATION_SENSITIVE_EXTS = ( + "py", "yml", "yaml", "haml", "slim", "pug", "jade", +) + +# Generated / lock files we refuse to MODIFY. +_FORBIDDEN_EDIT_EXTS = ( + "lock", "min.js", "min.css", +) +_FORBIDDEN_EDIT_NAMES = ( + "poetry.lock", "package-lock.json", "yarn.lock", + "Cargo.lock", "Gemfile.lock", "go.sum", +) + +# Quoted-path regex covers `'README.md'`, `"src/main.py"`, `` `LICENSE` ``. +_QUOTED_PATH_RE = re.compile(r"""[`'"]([\w./\-]+)[`'"]""") +# Bareword path: ``src/main.py``, ``README.md``. Must contain at +# least one ".ext" or "/". +_BAREWORD_PATH_RE = re.compile( + r"(? Intent: + q = " " + goal.lower() + " " + for intent, triggers in _INTENT_TRIGGERS: + for t in triggers: + if t in q: + return intent # type: ignore[return-value] + return "unknown" + + +def _extract_path_candidates(goal: str) -> List[str]: + """Pull every plausibly-file-shaped token out of the goal.""" + candidates: list[str] = [] + seen: set[str] = set() + + def _push(tok: str) -> None: + tok = tok.strip().strip(".,:;()") + if not tok or tok in seen: + return + seen.add(tok) + candidates.append(tok) + + for m in _QUOTED_PATH_RE.findall(goal): + _push(m) + for m in _BAREWORD_PATH_RE.findall(goal): + _push(m) + return candidates + + +def _verify_against_repo( + candidates: Sequence[str], + repo_files: Optional[Sequence[str]], +) -> List[str]: + """Drop candidates that don't exist in the repo. + + Match strategy: exact path or basename. Returns the canonical + repo path so the planner's prompt always uses the same casing + as the actual file (lower vs upper-case readme.md vs README.md). + """ + if not repo_files: + return list(candidates) + file_set = set(repo_files) + basename_map: dict[str, str] = {} + for p in repo_files: + basename = p.rsplit("/", 1)[-1] + basename_map.setdefault(basename, p) + out: list[str] = [] + for tok in candidates: + if tok in file_set: + out.append(tok) + elif tok in basename_map: + out.append(basename_map[tok]) + # Case-insensitive last-chance match. + else: + lower = tok.lower() + for p in repo_files: + if p.lower() == lower: + out.append(p) + break + return out + + +def _ext_of(path: str) -> str: + name = path.rsplit("/", 1)[-1] + if name in _EXTENSIONLESS_KEY_FILES: + return "" + # Special-case multi-dot extensions before the simple split. + lname = name.lower() + if lname.endswith(".min.js"): + return "min.js" + if lname.endswith(".min.css"): + return "min.css" + if "." not in name: + return "" + return name.rsplit(".", 1)[-1].lower() + + +def _is_fuzzy(goal: str) -> bool: + """A query is fuzzy when it reads like natural language — no + symbol-shaped tokens, no path mentions, ≥ N content words.""" + g = goal.strip() + if not g: + return False + if _QUOTED_PATH_RE.search(g) or _BAREWORD_PATH_RE.search(g): + return False + if _SYMBOL_RE.search(g): + return False + words = [w for w in re.split(r"\s+", g) if len(w) > 2] + return len(words) >= _FUZZY_MIN_WORDS + + +def _looks_like_symbol_search(goal: str) -> bool: + return bool(_SYMBOL_RE.search(goal)) + + +def _file_policy_notes(target_files: Sequence[str]) -> tuple[EditStrategy, str]: + """Roll up per-file policy into one human-readable note for the + planner prompt and one machine-readable strategy.""" + if not target_files: + return "surgical", "" + + forbidden = [] + indentation_sensitive = [] + for p in target_files: + ext = _ext_of(p) + name = p.rsplit("/", 1)[-1] + if ext in _FORBIDDEN_EDIT_EXTS or name in _FORBIDDEN_EDIT_NAMES: + forbidden.append(p) + elif ext in _INDENTATION_SENSITIVE_EXTS: + indentation_sensitive.append(p) + + if forbidden: + notes = ( + f"Refuse to MODIFY: {', '.join(forbidden)} — these are " + "generated / lock files. Edit the source manifest instead." + ) + return "reject", notes + + if indentation_sensitive: + notes = ( + "Use 'Edit a section of a file' (surgical) — the file " + "extension is indentation-sensitive. Quote leading " + "whitespace exactly when constructing old_string." + ) + return "surgical", notes + + return "surgical", "Use 'Edit a section of a file' for any MODIFY action." + + +# ---------------------------------------------------------------------- +# Public entry point +# ---------------------------------------------------------------------- + +DEFAULT_FUZZY_REPO_SIZE_FOR_RAG = 50 + + +def classify( + goal: str, + *, + repo_files: Optional[Sequence[str]] = None, + rag_index_exists: bool = False, + force_no_rag: bool = False, +) -> RouterDecision: + """Classify a user goal into a :class:`RouterDecision`. + + Pure: same inputs → identical outputs, no I/O, no LLM call. + + ``repo_files`` is optional but recommended — without it we can't + verify that the files the user mentioned actually exist. + """ + if not goal or not goal.strip(): + return RouterDecision( + intent="unknown", + rationale="empty goal", + tool_priority=["Get repository summary"], + ) + + intent = _detect_intent(goal) + raw_candidates = _extract_path_candidates(goal) + targets = _verify_against_repo(raw_candidates, repo_files) + fuzzy = _is_fuzzy(goal) + repo_size = len(repo_files) if repo_files is not None else 0 + too_small_for_rag = repo_size > 0 and repo_size < DEFAULT_FUZZY_REPO_SIZE_FOR_RAG + + # RAG / semantic search is only useful for *read-leaning* intents + # — finding something, diagnosing a fix, refactoring across files. + # Informational queries are answered from the repo map (no need + # for vectors); create / delete are structural and benefit from + # Glob, not embeddings. + _rag_eligible_intent = intent in ("find", "fix", "modify", "unknown") + rag_recommended = ( + fuzzy + and _rag_eligible_intent + and not force_no_rag + and not too_small_for_rag + ) + auto_index_repo = ( + rag_recommended + and not rag_index_exists + and not too_small_for_rag + ) + + edit_strategy, file_notes = _file_policy_notes(targets) + + tools: list[str] + if intent == "info": + # Informational: read README + repo map, no plan. + tools = ["Read file content", "Get repository summary"] + elif intent in ("fix", "modify") and targets: + tools = ["Read file content", "Edit a section of a file"] + if edit_strategy == "reject": + tools = ["Read file content"] + elif intent in ("fix", "modify") and not targets: + # Need to find the file first. + if rag_recommended: + tools = [ + "Find code by semantic search", + "Search file contents", + "Read file content", + "Edit a section of a file", + ] + else: + tools = [ + "Search file contents", + "Read file content", + "Edit a section of a file", + ] + elif intent == "find": + if rag_recommended: + tools = ["Find code by semantic search", "Search file contents", + "Read file content"] + elif _looks_like_symbol_search(goal): + tools = ["Search file contents", "Read file content"] + else: + tools = ["Find files matching a pattern", "Search file contents", + "Read file content"] + elif intent == "create": + tools = ["Get repository summary", "Read file content", + "Write or update a file in the repository"] + elif intent == "delete": + tools = ["Find files matching a pattern", + "Delete a file from the repository"] + else: + # unknown — default to the safe exploration set. + tools = [ + "Get repository summary", + "Find files matching a pattern", + "Search file contents", + "Read file content", + ] + + rationale = _build_rationale( + intent=intent, targets=targets, rag=rag_recommended, + auto_index=auto_index_repo, fuzzy=fuzzy, + ) + + return RouterDecision( + intent=intent, + target_files=targets, + tool_priority=tools, + rag_recommended=rag_recommended, + auto_index_repo=auto_index_repo, + edit_strategy=edit_strategy, + file_policy_notes=file_notes, + rationale=rationale, + repo_too_small_for_rag=too_small_for_rag, + ) + + +def _build_rationale( + *, intent: Intent, targets: Sequence[str], rag: bool, + auto_index: bool, fuzzy: bool, +) -> str: + parts = [f"intent={intent}"] + if targets: + parts.append(f"targets={','.join(targets[:3])}") + if rag: + parts.append("rag=preferred") + if auto_index: + parts.append("auto-index=requested") + if fuzzy and not rag: + parts.append("fuzzy") + return " · ".join(parts) + + +# ---------------------------------------------------------------------- +# Hint rendering — what the planner sees inside its prompt +# ---------------------------------------------------------------------- + +def render_planner_hint(decision: RouterDecision) -> str: + """Render the decision as a small markdown block to splice into + the planner's context_pack. Advisory tone — the planner may + override when context demands it.""" + lines = [ + "## ROUTING HINT (advisory — override if the goal demands more)", + f"- Intent: **{decision.intent}**", + ] + if decision.target_files: + lines.append( + "- Likely target files: " + ", ".join( + f"`{p}`" for p in decision.target_files[:5] + ) + ) + if decision.tool_priority: + lines.append( + "- Preferred tools (in order): " + + " → ".join(f"`{t}`" for t in decision.tool_priority) + ) + if decision.file_policy_notes: + lines.append(f"- File policy: {decision.file_policy_notes}") + if decision.rag_recommended: + lines.append( + "- Semantic search recommended for this fuzzy query. " + "Prefer `Find code by semantic search` before `Search file contents`." + ) + if decision.auto_index_repo: + lines.append( + "- A semantic index has not been built for this repo yet. " + "Include a Step 1 with action `INDEX` so the user can " + "approve the one-time build (~30 s, local, ~12 MB)." + ) + if decision.intent == "info": + lines.append( + "- This is an informational query. Produce a plan with " + "READ-only file actions and a substantive summary — do NOT " + "create / modify / delete files." + ) + return "\n".join(lines) + + +__all__ = [ + "DEFAULT_FUZZY_REPO_SIZE_FOR_RAG", + "RouterDecision", + "classify", + "render_planner_hint", +] diff --git a/gitpilot/rag/__init__.py b/gitpilot/rag/__init__.py new file mode 100644 index 0000000..60aaada --- /dev/null +++ b/gitpilot/rag/__init__.py @@ -0,0 +1,64 @@ +"""GitPilot local RAG pipeline (Batch B7). + +On-prem-first design — no cloud calls, no API keys. Defaults to +ChromaDB's bundled MiniLM-L6-v2 (downloaded once, ~80 MB on disk). +A pure-Python ``HashingEmbedder`` is shipped alongside as a dependency- +free fallback for tests and minimal-footprint deployments. + +Public entry points: + +* :func:`build_index_from_files` — given an iterable of + ``(path, content)`` pairs, chunk + embed + persist to ChromaDB. +* :func:`retrieve_top_k` — embed a query, return the best-matching + chunks across the persisted index. +* :func:`semantic_search_tool` — the CrewAI tool wrapper. + +Storage layout: + + //// ← Chroma persistent client + ////meta.json + { + "indexed_files": {"": ""}, + "embedder": "default" | "hashing", + "embedding_dim": 384, + "updated_at": "ISO-8601", + } + +Flags: + +* ``rag_retrieval`` — gates the agent tool registration and the + /api/repos/.../index endpoints. Default **off** (opt-in apex). +""" +from __future__ import annotations + +FLAG_RAG_RETRIEVAL = "rag_retrieval" + +from .chunker import Chunk, chunk_file, chunk_files # noqa: E402 +from .embedder import ( # noqa: E402 + Embedder, + HashingEmbedder, + get_default_embedder, +) +from .indexer import ( # noqa: E402 + IndexBuildReport, + IndexMeta, + build_index_from_files, +) +from .retriever import RetrievedChunk, retrieve_top_k # noqa: E402 +from .store import RagStore # noqa: E402 + +__all__ = [ + "FLAG_RAG_RETRIEVAL", + "Chunk", + "Embedder", + "HashingEmbedder", + "IndexBuildReport", + "IndexMeta", + "RagStore", + "RetrievedChunk", + "build_index_from_files", + "chunk_file", + "chunk_files", + "get_default_embedder", + "retrieve_top_k", +] diff --git a/gitpilot/rag/chunker.py b/gitpilot/rag/chunker.py new file mode 100644 index 0000000..3eb8ee7 --- /dev/null +++ b/gitpilot/rag/chunker.py @@ -0,0 +1,122 @@ +"""File-to-chunk splitter for the RAG indexer (Batch B7). + +Strategy (simplest viable): + +* **Line-window chunking with overlap.** Each chunk is up to + ``CHUNK_LINES`` source lines (default 40), with ``CHUNK_OVERLAP`` + lines (default 5) of overlap to preserve context across boundaries. +* **Binary / oversize skip.** Files larger than ``MAX_FILE_BYTES`` + or detected as binary are skipped silently. We never want a 10 MB + minified JS or a binary blob to poison the index. +* **Deterministic chunk ids.** ``:``, so + re-indexing the same file produces identical ids and ChromaDB's + upsert keeps the collection tidy. + +The chunker is intentionally language-naive — AST-aware splitting +(tree-sitter per language) is the next refinement once the simpler +approach is working. Even the naive version dramatically outperforms +"read every file" on >100-file repos. +""" +from __future__ import annotations + +import hashlib +from dataclasses import dataclass +from typing import Iterable, Iterator, List + +CHUNK_LINES = 40 +CHUNK_OVERLAP = 5 +MAX_FILE_BYTES = 256 * 1024 # 256 KB per file — bigger files are + # almost always generated / minified. + + +@dataclass(frozen=True) +class Chunk: + chunk_id: str + path: str + start_line: int # 1-indexed, inclusive + end_line: int # 1-indexed, inclusive + text: str + file_sha: str # short sha of the source file at chunk time + + +def _short_sha(data: str) -> str: + return hashlib.sha1(data.encode("utf-8", errors="replace")).hexdigest()[:16] + + +def _looks_binary(content: str) -> bool: + """Heuristic — null bytes or a high non-printable ratio in the + first chunk usually means binary. Anything ChromaDB embeds must + be reasonable text.""" + sample = content[:2048] + if "\x00" in sample: + return True + if not sample: + return False + bad = sum( + 1 for c in sample + if not (c.isprintable() or c in "\n\r\t ") + ) + return bad / max(1, len(sample)) > 0.3 + + +def chunk_file( + path: str, + content: str, + *, + chunk_lines: int = CHUNK_LINES, + overlap: int = CHUNK_OVERLAP, +) -> List[Chunk]: + """Split a single file's content into overlapping line windows. + + Returns an empty list when the file is empty, binary, or above + :data:`MAX_FILE_BYTES`. Never raises on bad input. + """ + if not content: + return [] + if len(content.encode("utf-8", errors="replace")) > MAX_FILE_BYTES: + return [] + if _looks_binary(content): + return [] + + chunk_lines = max(5, int(chunk_lines)) + overlap = max(0, min(int(overlap), chunk_lines - 1)) + step = chunk_lines - overlap + + lines = content.splitlines() + if not lines: + return [] + file_sha = _short_sha(content) + path_sha = hashlib.sha1(path.encode("utf-8")).hexdigest()[:12] + + out: List[Chunk] = [] + i = 0 + while i < len(lines): + window = lines[i : i + chunk_lines] + if not window: + break + start = i + 1 + end = i + len(window) + chunk_id = f"{path_sha}:{start}" + out.append( + Chunk( + chunk_id=chunk_id, + path=path, + start_line=start, + end_line=end, + text="\n".join(window), + file_sha=file_sha, + ) + ) + if end >= len(lines): + break + i += step + return out + + +def chunk_files( + files: Iterable[tuple[str, str]], +) -> Iterator[Chunk]: + """Yield chunks across an iterable of (path, content) pairs.""" + for path, content in files: + for chunk in chunk_file(path, content): + yield chunk diff --git a/gitpilot/rag/embedder.py b/gitpilot/rag/embedder.py new file mode 100644 index 0000000..2e6650c --- /dev/null +++ b/gitpilot/rag/embedder.py @@ -0,0 +1,165 @@ +"""Local embedding backends for the RAG pipeline (Batch B7). + +GitPilot is on-prem-first. We refuse to require a cloud API key for +the indexing path. Two backends ship: + +* :class:`DefaultEmbedder` — wraps ChromaDB's bundled + ``all-MiniLM-L6-v2`` ONNX model. ~80 MB downloaded once, 384-dim + vectors, free. This is the production default. +* :class:`HashingEmbedder` — pure-Python, deterministic, zero deps. + Produces a 256-dim sparse-hash representation. Quality is below + MiniLM but the recall is good enough for unit tests and for + environments where ``onnxruntime`` can't be installed. + +Both implement the same :class:`Embedder` Protocol so callers can swap +freely. Tests inject :class:`HashingEmbedder` so the suite doesn't +require an 80 MB download. + +The selection function :func:`get_default_embedder` tries the +production embedder first and falls back transparently when the +underlying deps aren't available. +""" +from __future__ import annotations + +import hashlib +import logging +import math +import re +from typing import Iterable, List, Protocol, runtime_checkable + +logger = logging.getLogger(__name__) + +HASHING_DIM = 256 +TOKEN_RE = re.compile(r"[A-Za-z_][A-Za-z0-9_]*") + + +@runtime_checkable +class Embedder(Protocol): + """Embedder Protocol — matches ChromaDB's EmbeddingFunction shape.""" + + @property + def name(self) -> str: + ... + + @property + def dim(self) -> int: + ... + + def __call__(self, texts: List[str]) -> List[List[float]]: + ... + + +# ---------------------------------------------------------------------- +# HashingEmbedder — dependency-free fallback +# ---------------------------------------------------------------------- + +class HashingEmbedder: + """Deterministic hash-bucket embedder. + + Tokenises the input on identifier boundaries, hashes each token + into one of :data:`HASHING_DIM` buckets, counts occurrences, then + L2-normalises. Two semantically-similar code snippets that share + identifier vocabulary will produce vectors close in cosine + distance — good enough for "find the file that mentions + ``foo_bar``" without any model download. + """ + name = "hashing-v1" + + def __init__(self, dim: int = HASHING_DIM) -> None: + self._dim = max(32, int(dim)) + + @property + def dim(self) -> int: + return self._dim + + def _embed_one(self, text: str) -> List[float]: + buckets = [0.0] * self._dim + for tok in TOKEN_RE.findall(text.lower()): + h = int(hashlib.md5(tok.encode("utf-8")).hexdigest(), 16) + buckets[h % self._dim] += 1.0 + # L2 normalise so cosine similarity stays in [0, 1]. + norm = math.sqrt(sum(b * b for b in buckets)) + if norm == 0.0: + return buckets + return [b / norm for b in buckets] + + def __call__(self, texts: List[str]) -> List[List[float]]: + return [self._embed_one(t) for t in texts] + + +# ---------------------------------------------------------------------- +# DefaultEmbedder — ChromaDB's bundled MiniLM +# ---------------------------------------------------------------------- + +class DefaultEmbedder: + """Wraps Chroma's :class:`DefaultEmbeddingFunction` so it matches + our :class:`Embedder` Protocol. Lazily constructed so we don't + pay the ONNX model load when only HashingEmbedder is used.""" + name = "chromadb-default-minilm-l6-v2" + _ef: object | None = None + + @property + def dim(self) -> int: + # MiniLM-L6-v2 is 384-dim. Hard-coded because Chroma's EF + # doesn't expose this through a stable attribute. + return 384 + + def _load(self) -> object: + if self._ef is None: + try: + from chromadb.utils.embedding_functions import ( + DefaultEmbeddingFunction, + ) + except Exception as exc: + raise RuntimeError( + "DefaultEmbedder requires chromadb + onnxruntime. " + "Install them or use HashingEmbedder instead." + ) from exc + self._ef = DefaultEmbeddingFunction() + return self._ef + + def __call__(self, texts: List[str]) -> List[List[float]]: + ef = self._load() + out = ef(texts) # type: ignore[operator] + # Ensure we return plain lists of floats (some Chroma versions + # return numpy arrays). + return [[float(x) for x in vec] for vec in out] + + +# ---------------------------------------------------------------------- +# Selection +# ---------------------------------------------------------------------- + +def get_default_embedder() -> Embedder: + """Return the production embedder if available, else the hashing + fallback. Caller doesn't need to know which one was picked — + both honour the same Protocol.""" + try: + emb = DefaultEmbedder() + # Trigger lazy load up-front so we fail fast if onnxruntime + # is missing — easier to recover than discovering it at the + # first ``__call__``. + emb._load() + return emb + except Exception as exc: + logger.info( + "[rag] falling back to HashingEmbedder (DefaultEmbedder unavailable): %s", + exc, + ) + return HashingEmbedder() + + +def cosine_similarity(a: Iterable[float], b: Iterable[float]) -> float: + """L2-cosine. Used by the in-process retriever for the hashing + backend (ChromaDB handles this internally on its own vectors).""" + al = list(a) + bl = list(b) + if not al or not bl: + return 0.0 + n = min(len(al), len(bl)) + dot = sum(al[i] * bl[i] for i in range(n)) + na = math.sqrt(sum(x * x for x in al[:n])) + nb = math.sqrt(sum(x * x for x in bl[:n])) + if na == 0.0 or nb == 0.0: + return 0.0 + return dot / (na * nb) diff --git a/gitpilot/rag/indexer.py b/gitpilot/rag/indexer.py new file mode 100644 index 0000000..5cac71a --- /dev/null +++ b/gitpilot/rag/indexer.py @@ -0,0 +1,193 @@ +"""Index-builder orchestration for the RAG pipeline (Batch B7). + +Take a list of ``(path, content)`` pairs, run them through the +chunker, push the chunks into the :class:`RagStore`, and persist a +small ``meta.json`` next to the Chroma directory so subsequent runs +can do incremental re-indexing instead of re-embedding everything. + +Public surface is :func:`build_index_from_files` and +:class:`IndexBuildReport`. The function is **synchronous** — +embedding is CPU-bound and we deliberately stay off async so future +batching / multi-process parallelism doesn't fight an event loop. +""" +from __future__ import annotations + +import hashlib +import json +import logging +from dataclasses import dataclass, field +from datetime import UTC, datetime +from pathlib import Path +from typing import Iterable, Optional + +from .chunker import chunk_file +from .embedder import Embedder, get_default_embedder +from .store import RagStore, _persist_dir + +logger = logging.getLogger(__name__) + + +@dataclass +class IndexMeta: + """On-disk header for an index — small JSON next to the Chroma dir.""" + owner: str + repo: str + branch: str + embedder: str + embedding_dim: int + indexed_files: dict[str, str] = field(default_factory=dict) # path -> file_sha + updated_at: str = field( + default_factory=lambda: datetime.now(UTC).isoformat(), + ) + + @classmethod + def load(cls, persist_dir: Path) -> Optional["IndexMeta"]: + path = persist_dir / "meta.json" + if not path.exists(): + return None + try: + raw = json.loads(path.read_text(encoding="utf-8")) + except Exception: + return None + try: + return cls( + owner=str(raw.get("owner", "") or ""), + repo=str(raw.get("repo", "") or ""), + branch=str(raw.get("branch", "") or ""), + embedder=str(raw.get("embedder", "") or ""), + embedding_dim=int(raw.get("embedding_dim", 0) or 0), + indexed_files={ + str(k): str(v) + for k, v in (raw.get("indexed_files") or {}).items() + }, + updated_at=str(raw.get("updated_at", "") or ""), + ) + except Exception: + return None + + def save(self, persist_dir: Path) -> None: + persist_dir.mkdir(parents=True, exist_ok=True) + (persist_dir / "meta.json").write_text( + json.dumps( + { + "owner": self.owner, + "repo": self.repo, + "branch": self.branch, + "embedder": self.embedder, + "embedding_dim": self.embedding_dim, + "indexed_files": self.indexed_files, + "updated_at": self.updated_at, + }, + indent=2, + ), + encoding="utf-8", + ) + + +@dataclass +class IndexBuildReport: + files_seen: int = 0 + files_indexed: int = 0 # actually re-embedded this run + files_skipped: int = 0 # unchanged since last index + chunks_added: int = 0 + embedder_name: str = "" + embedding_dim: int = 0 + + +def _file_sha(content: str) -> str: + return hashlib.sha1(content.encode("utf-8", errors="replace")).hexdigest()[:16] + + +def build_index_from_files( + files: Iterable[tuple[str, str]], + *, + owner: str, + repo: str, + branch: str, + embedder: Optional[Embedder] = None, + persist_dir: Optional[Path] = None, + force_full_rebuild: bool = False, +) -> IndexBuildReport: + """Index a batch of files into the per-(owner/repo/branch) store. + + Incremental: a file whose content hasn't changed since the last + build (matching ``file_sha`` in ``meta.json``) is skipped entirely + — no re-chunking, no re-embedding. + + ``force_full_rebuild=True`` deletes existing chunks and re-indexes + everything. Used by /api/repos/.../index/build with force=True. + """ + emb = embedder or get_default_embedder() + pdir = persist_dir or _persist_dir(owner, repo, branch) + store = RagStore( + owner=owner, repo=repo, branch=branch, + embedder=emb, persist_dir=pdir, + ) + meta = IndexMeta.load(pdir) or IndexMeta( + owner=owner, repo=repo, branch=branch, + embedder=emb.name, embedding_dim=emb.dim, + ) + if meta.embedder != emb.name or meta.embedding_dim != emb.dim: + # Embedder changed since last build — vectors are incomparable + # so we must rebuild from scratch. + logger.info( + "[rag] embedder changed (%s/%d → %s/%d) — full rebuild", + meta.embedder, meta.embedding_dim, emb.name, emb.dim, + ) + force_full_rebuild = True + meta = IndexMeta( + owner=owner, repo=repo, branch=branch, + embedder=emb.name, embedding_dim=emb.dim, + ) + + report = IndexBuildReport( + embedder_name=emb.name, + embedding_dim=emb.dim, + ) + + new_indexed: dict[str, str] = {} if force_full_rebuild else dict(meta.indexed_files) + + pending_chunks = [] + for path, content in files: + if not path or content is None: + continue + report.files_seen += 1 + sha = _file_sha(content) + old_sha = meta.indexed_files.get(path) + if not force_full_rebuild and old_sha == sha: + report.files_skipped += 1 + continue + + # Drop stale chunks for this file before adding fresh ones. + if old_sha is not None: + store.delete_by_path(path) + + chunks = chunk_file(path, content) + if not chunks: + # File was empty / binary / too large. Drop it from the + # index entirely so we don't keep referencing stale chunks. + new_indexed.pop(path, None) + continue + pending_chunks.extend(chunks) + new_indexed[path] = sha + report.files_indexed += 1 + + if force_full_rebuild: + # Drop everything for paths NOT in the new set as well — handles + # files that disappeared. + for stale_path in set(meta.indexed_files) - set(new_indexed): + store.delete_by_path(stale_path) + + if pending_chunks: + report.chunks_added = store.add_chunks(pending_chunks) + + meta.indexed_files = new_indexed + meta.embedder = emb.name + meta.embedding_dim = emb.dim + meta.updated_at = datetime.now(UTC).isoformat() + meta.save(pdir) + + return report + + +__all__ = ["IndexMeta", "IndexBuildReport", "build_index_from_files"] diff --git a/gitpilot/rag/retriever.py b/gitpilot/rag/retriever.py new file mode 100644 index 0000000..c3d9bbb --- /dev/null +++ b/gitpilot/rag/retriever.py @@ -0,0 +1,132 @@ +"""Top-k semantic retrieval for the RAG pipeline (Batch B7). + +Public function :func:`retrieve_top_k` and dataclass +:class:`RetrievedChunk`. Thin wrapper over :class:`RagStore.query` +that also applies a simple Maximum-Marginal-Relevance (MMR) re-rank +when ``mmr=True`` so the agent doesn't get N near-duplicates from +the same file. +""" +from __future__ import annotations + +import logging +from dataclasses import dataclass +from pathlib import Path +from typing import List, Optional + +from .embedder import Embedder, cosine_similarity, get_default_embedder +from .store import QueryHit, RagStore, _persist_dir + +logger = logging.getLogger(__name__) + + +@dataclass(frozen=True) +class RetrievedChunk: + path: str + start_line: int + end_line: int + text: str + score: float + + +def _to_retrieved(h: QueryHit) -> RetrievedChunk: + return RetrievedChunk( + path=h.path, + start_line=h.start_line, + end_line=h.end_line, + text=h.text, + score=h.score, + ) + + +def _mmr_rerank( + query_vec: List[float], + candidates: List[QueryHit], + *, + k: int, + lambda_: float = 0.7, + embedder: Embedder, +) -> List[QueryHit]: + """Maximum Marginal Relevance — pick a diverse top-k that still + ranks by similarity to the query. ``lambda_`` weights relevance + vs. novelty: 1.0 = pure relevance, 0.0 = pure diversity.""" + if not candidates or k <= 0: + return [] + # Pre-embed the candidate texts so we can compute pairwise novelty. + texts = [c.text for c in candidates] + vecs = embedder(texts) + + selected: List[int] = [] + remaining = list(range(len(candidates))) + while remaining and len(selected) < k: + best_idx = remaining[0] + best_score = -1e9 + for idx in remaining: + rel = cosine_similarity(query_vec, vecs[idx]) + if selected: + novelty = max( + cosine_similarity(vecs[idx], vecs[s]) + for s in selected + ) + else: + novelty = 0.0 + score = lambda_ * rel - (1 - lambda_) * novelty + if score > best_score: + best_score = score + best_idx = idx + selected.append(best_idx) + remaining.remove(best_idx) + return [candidates[i] for i in selected] + + +def retrieve_top_k( + query: str, + *, + owner: str, + repo: str, + branch: str, + k: int = 8, + embedder: Optional[Embedder] = None, + persist_dir: Optional[Path] = None, + mmr: bool = True, +) -> List[RetrievedChunk]: + """Return the k most-relevant chunks across the persisted index. + + Returns an empty list (silently) when: + + * the persist dir doesn't exist yet (no index built), + * the embedder can't be initialised, + * any internal error in ChromaDB. + + Callers should treat "no results" as "fall back to other tools", + not as an error. + """ + if not query or k <= 0: + return [] + emb = embedder or get_default_embedder() + pdir = persist_dir or _persist_dir(owner, repo, branch) + if not pdir.exists(): + return [] + + try: + store = RagStore( + owner=owner, repo=repo, branch=branch, + embedder=emb, persist_dir=pdir, + ) + except Exception as exc: + logger.debug("[rag] retriever: store init failed: %s", exc) + return [] + + # Over-fetch when MMR is on so re-ranking has something to chew on. + over_k = max(k, k * 3) if mmr else k + hits = store.query(query, k=over_k) + if not hits: + return [] + if mmr and len(hits) > k: + qv = emb([query])[0] + hits = _mmr_rerank(qv, hits, k=k, embedder=emb) + else: + hits = hits[:k] + return [_to_retrieved(h) for h in hits] + + +__all__ = ["RetrievedChunk", "retrieve_top_k"] diff --git a/gitpilot/rag/store.py b/gitpilot/rag/store.py new file mode 100644 index 0000000..e92a133 --- /dev/null +++ b/gitpilot/rag/store.py @@ -0,0 +1,201 @@ +"""ChromaDB-backed persistent store for the RAG pipeline (Batch B7). + +Wraps a per-(owner, repo, branch) Chroma collection so callers don't +have to think about embedder wiring, persistence paths, or upsert +semantics. Storage layout: + + //// + chroma.sqlite3 + hnsw segments (ChromaDB persistent client) + +The store can also fall back to an **in-memory** mode for tests — +when ``persist_dir`` is ``None`` we use ``chromadb.EphemeralClient``, +which keeps the same API but doesn't write to disk. + +Embedder is injected at construction so tests can use the dependency- +free :class:`HashingEmbedder` while production uses +:class:`DefaultEmbedder`. +""" +from __future__ import annotations + +import logging +import os +import re +from dataclasses import dataclass +from pathlib import Path +from typing import Iterable, List, Optional + +from .embedder import Embedder + +logger = logging.getLogger(__name__) + +# Override via env so tests can isolate (and CI can dump on cleanup). +RAG_ROOT_ENV = "GITPILOT_RAG_ROOT" +_DEFAULT_RAG_ROOT = Path.home() / ".gitpilot" / "rag" + + +def rag_root() -> Path: + override = os.environ.get(RAG_ROOT_ENV) + if override: + return Path(override) + return _DEFAULT_RAG_ROOT + + +def _persist_dir(owner: str, repo: str, branch: str) -> Path: + return rag_root() / owner / repo / _sanitize(branch) + + +def _sanitize(s: str) -> str: + """Make a path segment safe for any filesystem.""" + return re.sub(r"[^A-Za-z0-9._-]+", "_", s)[:80] or "_" + + +def _collection_name(owner: str, repo: str, branch: str) -> str: + """ChromaDB collection names must be 3–512 chars, alphanumeric + + underscore/hyphen, start/end alphanumeric. Build a deterministic + name that meets the rules.""" + raw = f"gp_{_sanitize(owner)}_{_sanitize(repo)}_{_sanitize(branch)}" + # Ensure start/end are alphanumeric. + raw = re.sub(r"^_+", "", raw) + raw = re.sub(r"_+$", "", raw) + return raw[:500] or "gp_default" + + +@dataclass(frozen=True) +class QueryHit: + chunk_id: str + path: str + start_line: int + end_line: int + text: str + score: float + + +class RagStore: + """Thin wrapper around a ChromaDB persistent collection.""" + + def __init__( + self, + *, + owner: str, + repo: str, + branch: str, + embedder: Embedder, + persist_dir: Optional[Path] = None, + ) -> None: + import chromadb # lazy import — heavy module + + self.owner = owner + self.repo = repo + self.branch = branch + self.embedder = embedder + self._collection_name = _collection_name(owner, repo, branch) + + if persist_dir is None: + persist_dir = _persist_dir(owner, repo, branch) + persist_dir.mkdir(parents=True, exist_ok=True) + self.persist_dir = persist_dir + + self._client = chromadb.PersistentClient(path=str(persist_dir)) + # ChromaDB will compute embeddings itself if we hand it our + # embedder via the ``embedding_function`` arg, but its newer + # versions are picky about the shape. We compute vectors + # ourselves and pass them via ``embeddings=`` on add/query — + # makes the store backend-agnostic. + self._collection = self._client.get_or_create_collection( + name=self._collection_name, + metadata={"hnsw:space": "cosine"}, + ) + + # ------------------------------------------------------------------ + # Mutation + # ------------------------------------------------------------------ + def add_chunks( + self, + chunks: Iterable[object], # avoid hard import cycle with chunker + ) -> int: + ids: List[str] = [] + documents: List[str] = [] + metadatas: List[dict[str, object]] = [] + for c in chunks: + ids.append(c.chunk_id) # type: ignore[attr-defined] + documents.append(c.text) # type: ignore[attr-defined] + metadatas.append({ + "path": c.path, # type: ignore[attr-defined] + "start_line": c.start_line, # type: ignore[attr-defined] + "end_line": c.end_line, # type: ignore[attr-defined] + "file_sha": c.file_sha, # type: ignore[attr-defined] + }) + if not ids: + return 0 + vectors = self.embedder(documents) + self._collection.upsert( + ids=ids, + documents=documents, + metadatas=metadatas, # type: ignore[arg-type] + embeddings=vectors, # type: ignore[arg-type] + ) + return len(ids) + + def delete_by_path(self, path: str) -> int: + """Drop every chunk for one source file (e.g. file removed + from the repo, or about to be re-indexed).""" + try: + res = self._collection.get(where={"path": path}) + ids = res.get("ids", []) if isinstance(res, dict) else [] + if ids: + self._collection.delete(ids=ids) + return len(ids or []) + except Exception as exc: + logger.debug("[rag] delete_by_path %s failed: %s", path, exc) + return 0 + + def count(self) -> int: + try: + return int(self._collection.count()) + except Exception: + return 0 + + # ------------------------------------------------------------------ + # Retrieval + # ------------------------------------------------------------------ + def query(self, text: str, *, k: int = 8) -> List[QueryHit]: + if not text or k <= 0: + return [] + if self.count() == 0: + return [] + vec = self.embedder([text])[0] + try: + res = self._collection.query( + query_embeddings=[vec], # type: ignore[arg-type] + n_results=max(1, int(k)), + ) + except Exception as exc: + logger.debug("[rag] query failed: %s", exc) + return [] + + # Chroma returns parallel lists per query — we always pass one. + ids_list = (res.get("ids") or [[]])[0] + docs_list = (res.get("documents") or [[]])[0] + metas_list = (res.get("metadatas") or [[]])[0] + dists_list = (res.get("distances") or [[]])[0] + + out: List[QueryHit] = [] + for i, cid in enumerate(ids_list): + doc = docs_list[i] if i < len(docs_list) else "" + meta = metas_list[i] if i < len(metas_list) and metas_list[i] else {} + dist = float(dists_list[i]) if i < len(dists_list) else 1.0 + score = max(0.0, 1.0 - dist) + out.append( + QueryHit( + chunk_id=str(cid), + path=str(meta.get("path", "")), + start_line=int(meta.get("start_line", 0) or 0), # type: ignore[arg-type] + end_line=int(meta.get("end_line", 0) or 0), # type: ignore[arg-type] + text=str(doc or ""), + score=score, + ) + ) + return out + + +__all__ = ["RagStore", "QueryHit", "rag_root"] diff --git a/gitpilot/rag_consent.py b/gitpilot/rag_consent.py new file mode 100644 index 0000000..f12e949 --- /dev/null +++ b/gitpilot/rag_consent.py @@ -0,0 +1,156 @@ +"""Per-repo consent for the local RAG index (Batch B9). + +Storage: ``~/.gitpilot/rag///.consent`` — a small JSON +file with the grant timestamp and identity. Branch-agnostic on +purpose: building a semantic index is a repo-level decision, and we +want the second branch of the same repo to inherit consent. + +Three operations the router needs: + +* :func:`has_consent` — fast: returns ``True`` iff the consent file + exists and is well-formed. +* :func:`grant_consent` — writes the file (idempotent). Called when + the user approves an INDEX plan step. +* :func:`revoke_consent` — deletes the file *and* removes the + persisted index directory. Called from Settings → Provider. + +All paths are sanitised via the same helper the RAG store uses, so +"weird" owner/repo names (with `/`, spaces, etc.) won't poke outside +the consent root. +""" +from __future__ import annotations + +import json +import logging +import os +import shutil +from dataclasses import asdict, dataclass +from datetime import UTC, datetime +from pathlib import Path +from typing import Optional + +from .rag.store import rag_root + +logger = logging.getLogger(__name__) + +CONSENT_FILE = ".consent" + + +@dataclass(frozen=True) +class ConsentRecord: + """Round-trip JSON shape for the consent file.""" + granted_at: str # ISO-8601 UTC + granted_by: Optional[str] = None # username / actor id if available + + +def _consent_dir(owner: str, repo: str) -> Path: + """Consent lives at the repo level (no branch segment) so all + branches of the same repo share the answer.""" + from .rag.store import _sanitize # local import — sanitiser kept + # private to store.py + return rag_root() / _sanitize(owner) / _sanitize(repo) + + +def _consent_path(owner: str, repo: str) -> Path: + return _consent_dir(owner, repo) / CONSENT_FILE + + +# ---------------------------------------------------------------------- +# Public API +# ---------------------------------------------------------------------- + +def has_consent(owner: str, repo: str) -> bool: + """Return ``True`` iff the user has previously approved indexing + for ``owner/repo``. Malformed / unreadable files count as "no + consent" — fail closed.""" + if not owner or not repo: + return False + path = _consent_path(owner, repo) + if not path.exists(): + return False + try: + raw = json.loads(path.read_text(encoding="utf-8")) + except Exception: + return False + # Minimum shape check. Anything else weird → no consent. + return isinstance(raw, dict) and isinstance(raw.get("granted_at"), str) + + +def grant_consent( + owner: str, + repo: str, + *, + granted_by: Optional[str] = None, +) -> ConsentRecord: + """Record consent for ``owner/repo``. Idempotent: calling twice + updates the timestamp but doesn't re-prompt the user.""" + if not owner or not repo: + raise ValueError("grant_consent: owner and repo are required") + record = ConsentRecord( + granted_at=datetime.now(UTC).isoformat(), + granted_by=granted_by, + ) + cdir = _consent_dir(owner, repo) + cdir.mkdir(parents=True, exist_ok=True) + _consent_path(owner, repo).write_text( + json.dumps(asdict(record), indent=2), + encoding="utf-8", + ) + return record + + +def revoke_consent(owner: str, repo: str) -> bool: + """Delete the consent file AND the persisted index for the repo. + + Returns ``True`` if anything was actually deleted, ``False`` if + there was nothing to revoke (already absent). Never raises on a + missing path — revocation is intent, not assertion. + """ + if not owner or not repo: + return False + cdir = _consent_dir(owner, repo) + removed = False + cpath = _consent_path(owner, repo) + if cpath.exists(): + try: + cpath.unlink() + removed = True + except OSError as exc: + logger.debug("[rag-consent] could not unlink %s: %s", cpath, exc) + # Wipe every per-branch index directory under this repo. The + # consent record was repo-level, the indexes are per-branch, so + # we recurse through immediate subdirectories. + if cdir.exists(): + try: + for entry in cdir.iterdir(): + if entry.is_dir(): + shutil.rmtree(entry, ignore_errors=True) + removed = True + except OSError as exc: + logger.debug("[rag-consent] could not iterate %s: %s", cdir, exc) + return removed + + +def load_record(owner: str, repo: str) -> Optional[ConsentRecord]: + """Return the persisted ConsentRecord, or ``None`` if absent / + malformed. Callers that need to surface "consented since 2025- + 01-02" can use this without writing their own JSON parsing.""" + if not has_consent(owner, repo): + return None + try: + raw = json.loads(_consent_path(owner, repo).read_text(encoding="utf-8")) + return ConsentRecord( + granted_at=str(raw.get("granted_at", "")), + granted_by=raw.get("granted_by"), + ) + except Exception: + return None + + +__all__ = [ + "ConsentRecord", + "grant_consent", + "has_consent", + "load_record", + "revoke_consent", +] diff --git a/gitpilot/repo_map.py b/gitpilot/repo_map.py new file mode 100644 index 0000000..6f817f5 --- /dev/null +++ b/gitpilot/repo_map.py @@ -0,0 +1,375 @@ +"""Hierarchical repository map (Batch B6). + +Generates a compact, factual "site map" of a repository — what +languages, what top-level modules, which files are entry points — +and persists it so every planner prompt can be primed with the same +high-level overview without re-discovering it each turn. + +Inspired by Aider's repo-map, Cursor's project context, and the +``AGENTS.md`` convention. This implementation is fully local: no +LLM call needed. We read the file tree, count extensions, identify +"key" files via well-known names, and emit a markdown blob bounded +by a hard token budget (default 500). + +Storage: + ~/.gitpilot/repo_maps/.json + +Invalidation: + Stored alongside the commit SHA the map was built from. When the + branch's HEAD moves, callers can detect the staleness via + ``RepoMap.commit_sha`` and refresh. + +Wiring: + Phase 6 of the enterprise roadmap. The next batch will inject + ``repo_map.agents_md`` into the planner's backstory through the + existing ``context_pack`` slot in ``generate_plan``. +""" +from __future__ import annotations + +import hashlib +import json +import logging +from collections import Counter +from dataclasses import asdict, dataclass, field +from datetime import UTC, datetime +from pathlib import Path +from typing import Callable, Iterable, List, Optional + +from . import flags +from .context_budget import estimate_tokens + +logger = logging.getLogger(__name__) + +FLAG_REPO_MAP = "repo_map" + +DEFAULT_MAP_TOKEN_BUDGET = 500 +MAP_MAX_KEY_FILES = 10 +MAP_MAX_MODULES = 12 +MAP_MAX_FILES_PER_MODULE = 6 +MAPS_DIR_ENV = "GITPILOT_REPO_MAPS_DIR" + +# Files we always lift into "key files" when present. Ordered by +# importance — first match wins for ranking. +_WELL_KNOWN_KEY_FILES: tuple[str, ...] = ( + "README.md", + "README.rst", + "README", + "AGENTS.md", + "CLAUDE.md", + "pyproject.toml", + "package.json", + "Cargo.toml", + "go.mod", + "pom.xml", + "build.gradle", + "Dockerfile", + "docker-compose.yml", + "Makefile", + ".github/workflows/ci.yml", + "LICENSE", + "CHANGELOG.md", +) + + +def _coerce_int_safe(value: object) -> int: + if isinstance(value, bool): + return 0 + if isinstance(value, int): + return value + if isinstance(value, str): + try: + return int(value) + except ValueError: + return 0 + return 0 + + +def _coerce_str_list(value: object) -> List[str]: + if not isinstance(value, list): + return [] + return [str(x) for x in value if x is not None] + + +def _coerce_lang_counts(value: object) -> dict[str, int]: + if not isinstance(value, dict): + return {} + return {str(k): _coerce_int_safe(v) for k, v in value.items()} + + +@dataclass +class ModuleSummary: + path: str # directory path, e.g. "src/util" + files: List[str] = field(default_factory=list) + file_count: int = 0 + + +@dataclass +class RepoMap: + """In-memory + on-disk representation of a repo's site map.""" + owner: str + repo: str + branch: str + commit_sha: Optional[str] = None + generated_at: str = field( + default_factory=lambda: datetime.now(UTC).isoformat(), + ) + languages: dict[str, int] = field(default_factory=dict) + key_files: List[str] = field(default_factory=list) + modules: List[ModuleSummary] = field(default_factory=list) + total_files: int = 0 + agents_md: str = "" + + def to_dict(self) -> dict[str, object]: + return asdict(self) + + @classmethod + def from_dict(cls, data: dict[str, object]) -> "RepoMap": + raw_modules = data.get("modules", []) or [] + modules: List[ModuleSummary] = [] + if isinstance(raw_modules, list): + for m in raw_modules: + if isinstance(m, dict): + modules.append(ModuleSummary( + path=str(m.get("path", "") or ""), + files=_coerce_str_list(m.get("files")), + file_count=_coerce_int_safe(m.get("file_count")), + )) + out = cls( + owner=str(data.get("owner", "") or ""), + repo=str(data.get("repo", "") or ""), + branch=str(data.get("branch", "") or ""), + commit_sha=( + str(data["commit_sha"]) if data.get("commit_sha") is not None else None + ), + generated_at=str(data.get("generated_at", "") or ""), + languages=_coerce_lang_counts(data.get("languages")), + key_files=_coerce_str_list(data.get("key_files")), + modules=modules, + total_files=_coerce_int_safe(data.get("total_files")), + agents_md=str(data.get("agents_md", "") or ""), + ) + return out + + +# ---------------------------------------------------------------------- +# Storage helpers +# ---------------------------------------------------------------------- + +def _maps_root() -> Path: + import os + + override = os.environ.get(MAPS_DIR_ENV) + if override: + return Path(override) + return Path.home() / ".gitpilot" / "repo_maps" + + +def _cache_key(owner: str, repo: str, branch: str) -> str: + raw = f"{owner}/{repo}@{branch}".encode("utf-8") + return hashlib.sha1(raw).hexdigest()[:24] + + +def _cache_path(owner: str, repo: str, branch: str) -> Path: + return _maps_root() / f"{_cache_key(owner, repo, branch)}.json" + + +def load_cached(owner: str, repo: str, branch: str) -> Optional[RepoMap]: + path = _cache_path(owner, repo, branch) + if not path.exists(): + return None + try: + data = json.loads(path.read_text(encoding="utf-8")) + return RepoMap.from_dict(data) + except Exception as exc: + logger.debug("[repo-map] could not load %s: %s", path, exc) + return None + + +def save_cached(repo_map: RepoMap) -> None: + root = _maps_root() + root.mkdir(parents=True, exist_ok=True) + path = _cache_path(repo_map.owner, repo_map.repo, repo_map.branch) + try: + path.write_text(json.dumps(repo_map.to_dict(), indent=2), encoding="utf-8") + except Exception as exc: # pragma: no cover - defensive + logger.debug("[repo-map] could not save %s: %s", path, exc) + + +# ---------------------------------------------------------------------- +# Map builder +# ---------------------------------------------------------------------- + +def _extension_of(path: str) -> str: + name = path.rsplit("/", 1)[-1] + if "." not in name: + return "(no-ext)" + return name.rsplit(".", 1)[-1].lower() + + +def _top_dir_of(path: str) -> str: + """Return the first directory segment of a path; '' for root files.""" + if "/" not in path: + return "" + return path.split("/", 1)[0] + + +def _group_into_modules(paths: List[str]) -> List[ModuleSummary]: + """Group files by top-level directory. Root-level files form a + "(root)" pseudo-module so they're still visible.""" + buckets: dict[str, List[str]] = {} + for p in sorted(paths): + top = _top_dir_of(p) or "(root)" + buckets.setdefault(top, []).append(p) + + modules: List[ModuleSummary] = [] + for name, files in buckets.items(): + modules.append( + ModuleSummary( + path=name, + files=files[:MAP_MAX_FILES_PER_MODULE], + file_count=len(files), + ) + ) + # Sort by file count descending so the most-populated modules + # come first — these are the ones the planner most needs to see. + modules.sort(key=lambda m: (-m.file_count, m.path)) + return modules[:MAP_MAX_MODULES] + + +def _select_key_files(paths: List[str]) -> List[str]: + by_name = {p.rsplit("/", 1)[-1]: p for p in paths} + selected: List[str] = [] + for well_known in _WELL_KNOWN_KEY_FILES: + # Match either the bare name at any depth or the exact path. + if well_known in paths: + selected.append(well_known) + continue + if "/" in well_known: + if well_known in paths: + selected.append(well_known) + continue + if well_known in by_name: + selected.append(by_name[well_known]) + if len(selected) >= MAP_MAX_KEY_FILES: + break + return selected + + +def _render_agents_md(repo_map: RepoMap, *, token_budget: int) -> str: + """Render the markdown blob that gets pinned into planner prompts. + + Bounded by ``token_budget``. If the first-pass output overshoots + we trim modules from the tail (least-populated) and try again. + """ + def _build(modules: List[ModuleSummary]) -> str: + lines: list[str] = [] + lines.append(f"# Repository map — `{repo_map.owner}/{repo_map.repo}` @ `{repo_map.branch}`") + lines.append("") + lines.append(f"**Total files:** {repo_map.total_files}") + if repo_map.languages: + top = sorted(repo_map.languages.items(), key=lambda kv: -kv[1])[:8] + lines.append( + "**Languages:** " + ", ".join(f"`{ext}`={n}" for ext, n in top) + ) + if repo_map.key_files: + lines.append("") + lines.append("## Key files") + for kf in repo_map.key_files: + lines.append(f"- `{kf}`") + if modules: + lines.append("") + lines.append("## Modules") + for mod in modules: + lines.append(f"- **`{mod.path}/`** — {mod.file_count} file(s)") + for f in mod.files: + lines.append(f" - `{f}`") + lines.append("") + lines.append( + "_Use the `Find files matching a pattern`, `Search file contents` " + "and `Read file content` tools to drill into anything above._" + ) + return "\n".join(lines) + + modules = list(repo_map.modules) + out = _build(modules) + while estimate_tokens(out) > token_budget and len(modules) > 3: + # Drop the least-populated module and try again. + modules = modules[:-1] + out = _build(modules) + return out + + +def build_repo_map( + *, + owner: str, + repo: str, + branch: str, + paths: Iterable[str], + commit_sha: Optional[str] = None, + token_budget: int = DEFAULT_MAP_TOKEN_BUDGET, +) -> RepoMap: + """Deterministically construct a :class:`RepoMap` from a list of + repository file paths. Pure function — no I/O, no network, no + LLM call. Caller is responsible for fetching the paths (today + that's ``get_repo_tree`` for GitHub mode or ``Path.rglob`` for + local mode). + """ + files = sorted({p.strip() for p in paths if p and isinstance(p, str)}) + languages = dict(Counter(_extension_of(p) for p in files)) + key_files = _select_key_files(files) + modules = _group_into_modules(files) + + repo_map = RepoMap( + owner=owner, + repo=repo, + branch=branch, + commit_sha=commit_sha, + languages=languages, + key_files=key_files, + modules=modules, + total_files=len(files), + ) + repo_map.agents_md = _render_agents_md(repo_map, token_budget=token_budget) + return repo_map + + +def get_or_build_repo_map( + *, + owner: str, + repo: str, + branch: str, + paths_provider: Callable[[], Iterable[str]], + commit_sha: Optional[str] = None, + token_budget: int = DEFAULT_MAP_TOKEN_BUDGET, + force: bool = False, +) -> RepoMap: + """Return a cached map if it's still valid for the current commit, + otherwise build a fresh one and persist. ``paths_provider`` is + a zero-arg callable that returns ``Iterable[str]`` of repo paths — + keeps this function independent of how paths are fetched. + """ + if not force: + cached = load_cached(owner, repo, branch) + if cached and cached.commit_sha == commit_sha and commit_sha is not None: + return cached + + paths = list(paths_provider()) + fresh = build_repo_map( + owner=owner, repo=repo, branch=branch, + paths=paths, commit_sha=commit_sha, + token_budget=token_budget, + ) + save_cached(fresh) + return fresh + + +__all__ = [ + "FLAG_REPO_MAP", + "DEFAULT_MAP_TOKEN_BUDGET", + "ModuleSummary", + "RepoMap", + "build_repo_map", + "get_or_build_repo_map", + "load_cached", + "save_cached", +] diff --git a/gitpilot/sandbox_api.py b/gitpilot/sandbox_api.py new file mode 100644 index 0000000..54d09fb --- /dev/null +++ b/gitpilot/sandbox_api.py @@ -0,0 +1,666 @@ +"""HTTP surface for the sandbox runtime switch. + +Three endpoints, all additive: + +* ``GET /api/sandbox/status`` — what's configured, can we reach it? +* ``PUT /api/sandbox/config`` — update the persisted SandboxSettings. +* ``POST /api/sandbox/run`` — execute one ``{language, code}`` snippet + through the currently-selected backend. + +The chat UI uses :func:`run_snippet` to power the per-codeblock Run button +introduced in the AssistantMessage component, so a user can ask "write a +hello-world in Python", click Run, and see the output inline without +leaving GitPilot. Which sandbox actually executes the snippet (local +subprocess vs MatrixLab Runner) is controlled by Settings → Sandbox +Runtime. + +Routes are mounted from :mod:`gitpilot.api` via ``app.include_router``. +""" +from __future__ import annotations + +import asyncio +import logging +import os +import shlex +import shutil +import tempfile +import time +from pathlib import Path +from typing import Any, Dict, List, Optional + +import httpx +from fastapi import APIRouter, HTTPException +from pydantic import BaseModel, Field + +from .sandbox import ( + BACKEND_MATRIXLAB, + BACKEND_OFF, + BACKEND_SUBPROCESS, + DEFAULT_TIMEOUT_SEC, + SandboxPolicy, + SandboxResult, + SandboxUnavailableError, + SandboxRunError, + MatrixLabSandbox, + NullSandbox, + SubprocessSandbox, +) +from .settings import AppSettings, SandboxSettings, get_settings, update_settings + +logger = logging.getLogger(__name__) +router = APIRouter(prefix="/api/sandbox", tags=["sandbox"]) + +# How each fenced-code language is launched inside the sandbox. Anything +# the user can opt-in to from the Run button has to be listed here; the +# whitelist keeps random shells (``ruby``, ``perl``, ...) from being +# silently executed just because an LLM tagged a fence with that name. +LANGUAGE_RUNNERS: Dict[str, Dict[str, Any]] = { + "python": {"suffix": ".py", "argv": ["python3", "{file}"]}, + "py": {"suffix": ".py", "argv": ["python3", "{file}"]}, + "javascript": {"suffix": ".js", "argv": ["node", "{file}"]}, + "js": {"suffix": ".js", "argv": ["node", "{file}"]}, + "node": {"suffix": ".js", "argv": ["node", "{file}"]}, + "bash": {"suffix": ".sh", "argv": ["bash", "{file}"]}, + "sh": {"suffix": ".sh", "argv": ["bash", "{file}"]}, + "shell": {"suffix": ".sh", "argv": ["bash", "{file}"]}, +} + +ALLOWED_BACKENDS = {BACKEND_OFF, BACKEND_SUBPROCESS, BACKEND_MATRIXLAB} + + +# ---------------------------------------------------------------------- +# Request / response models +# ---------------------------------------------------------------------- + +class SandboxStatusResponse(BaseModel): + backend: str + available_backends: list[str] + matrixlab_url: str + matrixlab_image: str + allow_network: bool + timeout_sec: int + has_token: bool + ok: bool + error: Optional[str] = None + remote: Optional[Dict[str, Any]] = None + # Name of the env var currently shadowing the persisted backend + # choice, if any. Used by the Settings panel to render an "env + # override" badge so users understand why their UI selection isn't + # taking effect. ``None`` when persistence is authoritative. + env_override: Optional[str] = None + + +class SandboxConfigUpdate(BaseModel): + backend: Optional[str] = None + matrixlab_url: Optional[str] = None + matrixlab_token: Optional[str] = None + matrixlab_image: Optional[str] = None + allow_network: Optional[bool] = None + timeout_sec: Optional[int] = Field(default=None, ge=1, le=600) + + +class SandboxRunRequest(BaseModel): + language: str + code: str + timeout_sec: Optional[int] = Field(default=None, ge=1, le=600) + + +class SandboxRunResponse(BaseModel): + backend: str + language: str + command: str + exit_code: int + stdout: str + stderr: str + duration_ms: int + truncated: bool = False + timed_out: bool = False + sandbox_id: Optional[str] = None + + +# ---------------------------------------------------------------------- +# Helpers +# ---------------------------------------------------------------------- + +def _build_sandbox(cfg: SandboxSettings, *, workspace: Path, timeout: int): + """Construct the right sandbox instance from persisted settings. + + Distinct from :func:`gitpilot.sandbox.get_sandbox` because that + factory reads ``settings={"tools": {"sandbox": ...}}`` for backwards + compatibility with the older shape; here we already have the typed + :class:`SandboxSettings` so the indirection isn't needed. + """ + policy = SandboxPolicy( + workspace=workspace, + timeout_sec=timeout, + allow_network=cfg.allow_network, + image=cfg.matrixlab_image or None, + ) + backend = (cfg.backend or BACKEND_SUBPROCESS).strip().lower() + if backend == BACKEND_OFF: + return NullSandbox(policy) + if backend == BACKEND_MATRIXLAB: + return MatrixLabSandbox( + policy, + base_url=cfg.matrixlab_url or None, + token=cfg.matrixlab_token or None, + ) + return SubprocessSandbox(policy) + + +def _detect_env_override() -> Optional[str]: + """Return the env var name currently shadowing the persisted backend + choice, or None if persistence wins. Mirrors the precedence rules + in :func:`gitpilot.sandbox._resolve_backend_name` so what we surface + in the UI matches what actually executes.""" + import os as _os + + for name in ( + "GITPILOT_SANDBOX", + "GITPILOT_MATRIXLAB_URL", + "GITPILOT_MATRIXLAB_TOKEN", + "GITPILOT_MATRIXLAB_IMAGE", + ): + if _os.environ.get(name): + return name + return None + + +def _status_from(cfg: SandboxSettings, health: Dict[str, Any]) -> SandboxStatusResponse: + return SandboxStatusResponse( + backend=cfg.backend, + available_backends=sorted(ALLOWED_BACKENDS), + matrixlab_url=cfg.matrixlab_url, + matrixlab_image=cfg.matrixlab_image, + allow_network=cfg.allow_network, + timeout_sec=cfg.timeout_sec, + has_token=bool(cfg.matrixlab_token), + ok=bool(health.get("ok")), + error=health.get("error"), + remote=health.get("remote"), + env_override=_detect_env_override(), + ) + + +# ---------------------------------------------------------------------- +# Endpoints +# ---------------------------------------------------------------------- + +@router.get("/status", response_model=SandboxStatusResponse) +async def api_sandbox_status() -> SandboxStatusResponse: + """Report which backend is selected and whether it's reachable.""" + s: AppSettings = get_settings() + cfg = s.sandbox + workspace = Path.cwd() + sb = _build_sandbox(cfg, workspace=workspace, timeout=cfg.timeout_sec) + try: + health = await sb.health() + finally: + # MatrixLabSandbox owns an httpx client; close it so we don't + # leak sockets on every status poll from the settings page. + aclose = getattr(sb, "aclose", None) + if aclose is not None: + await aclose() + return _status_from(cfg, health) + + +@router.put("/config", response_model=SandboxStatusResponse) +async def api_sandbox_config(update: SandboxConfigUpdate) -> SandboxStatusResponse: + """Persist new sandbox settings and return the resulting status.""" + if update.backend is not None and update.backend not in ALLOWED_BACKENDS: + raise HTTPException( + status_code=400, + detail=f"unknown sandbox backend {update.backend!r}; " + f"expected one of {sorted(ALLOWED_BACKENDS)}", + ) + + s: AppSettings = get_settings() + merged: Dict[str, Any] = s.sandbox.model_dump() + for field, value in update.model_dump(exclude_none=True).items(): + merged[field] = value + + updated = update_settings({"sandbox": merged}) + cfg = updated.sandbox + + # Probe the new configuration so the UI can flip its health pill in + # one round-trip (mirrors what /status does). + sb = _build_sandbox(cfg, workspace=Path.cwd(), timeout=cfg.timeout_sec) + try: + health = await sb.health() + finally: + aclose = getattr(sb, "aclose", None) + if aclose is not None: + await aclose() + return _status_from(cfg, health) + + +@router.post("/run", response_model=SandboxRunResponse) +async def api_sandbox_run(req: SandboxRunRequest) -> SandboxRunResponse: + """Execute a fenced-code snippet through the configured sandbox. + + Powers the per-codeblock Run button in AssistantMessage: the chat + UI POSTs ``{language, code}`` and renders ``stdout`` / ``stderr`` / + ``exit_code`` next to the snippet. The selected backend (local + subprocess vs MatrixLab) is whatever the user picked in Settings. + """ + lang = req.language.strip().lower() + spec = LANGUAGE_RUNNERS.get(lang) + if spec is None: + raise HTTPException( + status_code=400, + detail=f"language {req.language!r} is not runnable; " + f"allowed: {sorted(set(LANGUAGE_RUNNERS))}", + ) + if not req.code.strip(): + raise HTTPException(status_code=400, detail="code is empty") + + s: AppSettings = get_settings() + cfg = s.sandbox + timeout = req.timeout_sec or cfg.timeout_sec or DEFAULT_TIMEOUT_SEC + + # MatrixLab has a purpose-built snippet endpoint (POST /code/run) that + # accepts {language, code} directly and dispatches into the right + # per-language sandbox image. When the user selected the matrixlab + # backend, route there instead of running the snippet locally and + # asking MatrixLab to re-execute the resulting argv via /repo/run — + # /code/run is what the Runner is designed to serve for this flow. + if (cfg.backend or "").strip().lower() == BACKEND_MATRIXLAB: + return await _run_via_matrixlab_code_endpoint(cfg, lang, req.code, timeout) + + # Materialise the snippet in a fresh tempdir so the workspace jail + # in SubprocessSandbox has somewhere to point at. MatrixLabSandbox + # mounts the same path into the container via ``mount_workspace``, + # so the runner sees the same file at the same path — keeping the + # contract identical across backends. + with tempfile.TemporaryDirectory(prefix="gitpilot-run-") as tmp: + workspace = Path(tmp) + snippet_path = workspace / f"snippet{spec['suffix']}" + snippet_path.write_text(req.code, encoding="utf-8") + argv = [ + tok.replace("{file}", str(snippet_path)) for tok in spec["argv"] + ] + command_str = shlex.join(argv) + + sb = _build_sandbox(cfg, workspace=workspace, timeout=timeout) + try: + try: + result: SandboxResult = await sb.run( + argv, cwd=workspace, timeout=timeout + ) + except SandboxUnavailableError as exc: + raise HTTPException( + status_code=503, + detail=f"sandbox backend {cfg.backend!r} is unreachable: {exc}", + ) from exc + except SandboxRunError as exc: + raise HTTPException( + status_code=502, + detail=f"sandbox backend {cfg.backend!r} returned an error: {exc}", + ) from exc + except PermissionError as exc: + raise HTTPException(status_code=400, detail=str(exc)) from exc + finally: + aclose = getattr(sb, "aclose", None) + if aclose is not None: + await aclose() + + return SandboxRunResponse( + backend=result.backend, + language=lang, + command=command_str, + exit_code=result.exit_code, + stdout=result.stdout, + stderr=result.stderr, + duration_ms=result.duration_ms, + truncated=result.truncated, + timed_out=result.timed_out, + sandbox_id=result.sandbox_id, + ) + + +# MatrixLab's CodeRunRequest only accepts these literals; aliases ("py", +# "js", "node", "sh", "shell") get normalised before the call. +_MATRIXLAB_LANGUAGE = { + "python": "python", + "py": "python", + "javascript": "javascript", + "js": "javascript", + "node": "javascript", + "bash": "bash", + "sh": "bash", + "shell": "bash", +} + + +async def _run_via_matrixlab_code_endpoint( + cfg: SandboxSettings, lang: str, code: str, timeout: int +) -> SandboxRunResponse: + """POST /code/run on the MatrixLab Runner. + + Direct call (not via :class:`MatrixLabSandbox`) because the snippet + endpoint takes ``{language, code}`` rather than the + command-with-mounted-workspace shape that ``/repo/run`` expects. + """ + target_lang = _MATRIXLAB_LANGUAGE.get(lang) + if target_lang is None: + raise HTTPException( + status_code=400, + detail=f"language {lang!r} is not supported by MatrixLab /code/run", + ) + base_url = (cfg.matrixlab_url or "http://localhost:8000").rstrip("/") + headers = {"Content-Type": "application/json"} + if cfg.matrixlab_token: + headers["Authorization"] = f"Bearer {cfg.matrixlab_token}" + body = { + "language": target_lang, + "code": code, + "timeout": timeout, + "allow_network": cfg.allow_network, + } + if cfg.matrixlab_image: + body["image"] = cfg.matrixlab_image + + start = time.monotonic() + try: + async with httpx.AsyncClient(timeout=timeout + 5) as client: + resp = await client.post(f"{base_url}/code/run", json=body, headers=headers) + except httpx.HTTPError as exc: + raise HTTPException( + status_code=503, + detail=f"sandbox backend 'matrixlab' is unreachable: {exc}", + ) from exc + duration_ms = int((time.monotonic() - start) * 1000) + + if resp.status_code >= 400: + raise HTTPException( + status_code=502, + detail=f"MatrixLab /code/run returned {resp.status_code}: {resp.text[:400]}", + ) + data = resp.json() + return SandboxRunResponse( + backend=BACKEND_MATRIXLAB, + language=lang, + command=f"{target_lang} ", + exit_code=int(data.get("exit_code", -1)), + stdout=str(data.get("stdout", "")), + stderr=str(data.get("stderr", "")), + duration_ms=int(data.get("duration_ms", duration_ms)), + truncated=bool(data.get("truncated", False)), + timed_out=bool(data.get("timed_out", False)), + sandbox_id=data.get("sandbox_id"), + ) + + +# ---------------------------------------------------------------------- +# MatrixLab lifecycle (install / start) — opt-in via env flag +# ---------------------------------------------------------------------- +# +# Lifecycle endpoints shell out to the host (``docker pull``, +# ``docker run``), so they are gated behind ``GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE=1``. +# When the gate is off, GET /lifecycle still works — it just reports +# the inventory and surfaces a clear "operator must enable" message +# on the action booleans, and POST /install / /start return 403. This +# keeps the default GitPilot deployment honest: no shell from a web +# endpoint unless the operator opted in. + +ENV_LIFECYCLE = "GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE" +# Image the Runner ships under (matches matrixlab/Makefile's +# $(REGISTRY)/$(DOCKERHUB_NAMESPACE)/matrixlab-runner). Operator can +# override via env when running a custom build. +DEFAULT_RUNNER_IMAGE = os.environ.get( + "GITPILOT_MATRIXLAB_RUNNER_IMAGE", + "ruslanmv/matrixlab-runner:latest", +) +# Sandbox images the Runner spawns per language. Pulling these at +# install time means the first /code/run from the chat UI doesn't +# stall on a multi-hundred-MB image fetch. +DEFAULT_SANDBOX_IMAGES = [ + "matrix-lab-sandbox-python:latest", + "matrix-lab-sandbox-node:latest", + "matrix-lab-sandbox-utils:latest", +] +DEFAULT_CONTAINER_NAME = os.environ.get( + "GITPILOT_MATRIXLAB_CONTAINER", + "gitpilot-matrixlab", +) + + +class _StepResult(BaseModel): + cmd: str + exit_code: int + stdout: str = "" + stderr: str = "" + duration_ms: int = 0 + + +class MatrixLabLifecycleResponse(BaseModel): + docker_available: bool + installed: bool + running: bool + lifecycle_enabled: bool + runner_image: str + sandbox_images: List[str] + container_name: str + matrixlab_url: str + instructions: Optional[str] = None + error: Optional[str] = None + steps: List[_StepResult] = Field(default_factory=list) + + +def _lifecycle_enabled() -> bool: + return os.environ.get(ENV_LIFECYCLE, "").strip().lower() in {"1", "true", "yes", "on"} + + +async def _run_shell(cmd: List[str], *, timeout: int = 600) -> _StepResult: + """Run a host command, capture stdout/stderr, never raise. + + Used for the docker / matrixlab lifecycle commands so the response + body always carries the full transcript even when a step fails — + matches the "errors are first-class signals" UX of the agent loop. + """ + start = time.monotonic() + try: + proc = await asyncio.create_subprocess_exec( + *cmd, + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + try: + stdout_b, stderr_b = await asyncio.wait_for(proc.communicate(), timeout=timeout) + except asyncio.TimeoutError: + proc.kill() + return _StepResult( + cmd=shlex.join(cmd), + exit_code=-1, + stderr=f"timed out after {timeout}s", + duration_ms=int((time.monotonic() - start) * 1000), + ) + return _StepResult( + cmd=shlex.join(cmd), + exit_code=proc.returncode or 0, + stdout=stdout_b.decode("utf-8", errors="replace")[:8_000], + stderr=stderr_b.decode("utf-8", errors="replace")[:8_000], + duration_ms=int((time.monotonic() - start) * 1000), + ) + except FileNotFoundError as exc: + return _StepResult( + cmd=shlex.join(cmd), + exit_code=-2, + stderr=str(exc), + duration_ms=int((time.monotonic() - start) * 1000), + ) + + +def _docker_available() -> bool: + return shutil.which("docker") is not None + + +async def _docker_image_present(name: str) -> bool: + """True when ``docker images -q `` returns at least one ID.""" + if not _docker_available(): + return False + step = await _run_shell(["docker", "images", "-q", name], timeout=10) + return step.exit_code == 0 and bool(step.stdout.strip()) + + +async def _matrixlab_running() -> bool: + """Probe the configured Runner URL for a healthy /health response.""" + cfg = get_settings().sandbox + base = (cfg.matrixlab_url or "http://localhost:8000").rstrip("/") + try: + async with httpx.AsyncClient(timeout=3.0) as client: + resp = await client.get(f"{base}/health") + return resp.status_code == 200 + except httpx.HTTPError: + return False + + +async def _gather_lifecycle_status(steps: Optional[List[_StepResult]] = None) -> MatrixLabLifecycleResponse: + cfg = get_settings().sandbox + docker_ok = _docker_available() + runner_installed = await _docker_image_present(DEFAULT_RUNNER_IMAGE) + running = await _matrixlab_running() + enabled = _lifecycle_enabled() + instructions: Optional[str] = None + if not docker_ok: + instructions = ( + "Docker is not installed or not on PATH on the GitPilot host. " + "Install Docker (https://docs.docker.com/get-docker/) before " + "the Install / Start buttons can do anything." + ) + elif not enabled: + instructions = ( + "Lifecycle automation is off. To let GitPilot pull and start " + f"MatrixLab from the Settings panel set the {ENV_LIFECYCLE}=1 " + "environment variable on the GitPilot backend and restart. " + "Until then, run 'docker compose up -d' from a MatrixLab " + "checkout (https://github.com/agent-matrix/matrixlab) yourself." + ) + return MatrixLabLifecycleResponse( + docker_available=docker_ok, + installed=runner_installed, + running=running, + lifecycle_enabled=enabled, + runner_image=DEFAULT_RUNNER_IMAGE, + sandbox_images=DEFAULT_SANDBOX_IMAGES, + container_name=DEFAULT_CONTAINER_NAME, + matrixlab_url=cfg.matrixlab_url, + instructions=instructions, + steps=steps or [], + ) + + +@router.get("/matrixlab/lifecycle", response_model=MatrixLabLifecycleResponse) +async def api_matrixlab_lifecycle() -> MatrixLabLifecycleResponse: + """Report whether MatrixLab is installed locally and running. + + Used by the Settings panel to decide which button to show: + Install (when no runner image present) → Start (image present but + URL unreachable) → Running (URL healthy). Always safe to call — + pure inspection, no side effects. + """ + return await _gather_lifecycle_status() + + +@router.post("/matrixlab/install", response_model=MatrixLabLifecycleResponse) +async def api_matrixlab_install() -> MatrixLabLifecycleResponse: + """Pull the MatrixLab runner + sandbox images. + + Gated by GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE. Each pull is a + distinct step in the response so the UI can show a per-image + progress strip (and the operator can re-read failures verbatim). + """ + if not _lifecycle_enabled(): + raise HTTPException( + status_code=403, + detail=( + f"set {ENV_LIFECYCLE}=1 on the GitPilot backend to enable " + "the Install button" + ), + ) + if not _docker_available(): + raise HTTPException(status_code=503, detail="docker is not on PATH") + steps: List[_StepResult] = [] + images = [DEFAULT_RUNNER_IMAGE, *DEFAULT_SANDBOX_IMAGES] + for image in images: + steps.append(await _run_shell(["docker", "pull", image], timeout=900)) + return await _gather_lifecycle_status(steps=steps) + + +@router.post("/matrixlab/start", response_model=MatrixLabLifecycleResponse) +async def api_matrixlab_start() -> MatrixLabLifecycleResponse: + """Start the MatrixLab runner as a detached container. + + Gated by GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE. The container name + is deterministic (``gitpilot-matrixlab`` by default) so repeated + clicks reuse it — ``docker start`` an existing stopped container, + or ``docker run`` if it doesn't exist yet. The Docker socket is + bind-mounted so the runner can spawn per-language sandbox + containers — matches what 'make run' inside a MatrixLab checkout + does. + """ + if not _lifecycle_enabled(): + raise HTTPException( + status_code=403, + detail=( + f"set {ENV_LIFECYCLE}=1 on the GitPilot backend to enable " + "the Start button" + ), + ) + if not _docker_available(): + raise HTTPException(status_code=503, detail="docker is not on PATH") + + steps: List[_StepResult] = [] + # Determine which port to expose locally — derive it from the + # configured matrixlab_url so 'Start' agrees with /status. + cfg = get_settings().sandbox + port = 8000 + try: + from urllib.parse import urlparse + + parsed = urlparse(cfg.matrixlab_url) + if parsed.port: + port = int(parsed.port) + except Exception: # noqa: BLE001 + port = 8000 + + # Does a container with the canonical name already exist? + inspect = await _run_shell( + ["docker", "inspect", "--format", "{{.State.Status}}", DEFAULT_CONTAINER_NAME], + timeout=15, + ) + steps.append(inspect) + if inspect.exit_code == 0: + # Container exists — start it if stopped, otherwise leave it be. + steps.append(await _run_shell(["docker", "start", DEFAULT_CONTAINER_NAME], timeout=60)) + else: + run_cmd = [ + "docker", "run", "-d", + "--name", DEFAULT_CONTAINER_NAME, + "-p", f"{port}:8000", + "-v", "/var/run/docker.sock:/var/run/docker.sock", + "--restart", "unless-stopped", + DEFAULT_RUNNER_IMAGE, + ] + steps.append(await _run_shell(run_cmd, timeout=120)) + + return await _gather_lifecycle_status(steps=steps) + + +@router.post("/matrixlab/stop", response_model=MatrixLabLifecycleResponse) +async def api_matrixlab_stop() -> MatrixLabLifecycleResponse: + """Stop the GitPilot-managed MatrixLab container. + + Only affects the deterministic ``gitpilot-matrixlab`` container — + won't touch containers an operator launched manually. Gated by + the same env flag as install/start. + """ + if not _lifecycle_enabled(): + raise HTTPException( + status_code=403, + detail=f"set {ENV_LIFECYCLE}=1 to enable lifecycle actions", + ) + if not _docker_available(): + raise HTTPException(status_code=503, detail="docker is not on PATH") + step = await _run_shell(["docker", "stop", DEFAULT_CONTAINER_NAME], timeout=30) + return await _gather_lifecycle_status(steps=[step]) diff --git a/gitpilot/session.py b/gitpilot/session.py index 6b5f907..6915967 100644 --- a/gitpilot/session.py +++ b/gitpilot/session.py @@ -42,6 +42,35 @@ class Checkpoint: snapshot_path: str | None = None +@dataclass +class Task: + """One AI invocation recorded for the right-sidebar Tasks panel. + + Append-only. Created with ``status="running"`` at the start of a + user-facing operation (Plan or Execute), mutated in place once on + completion, and never edited again. The shape intentionally + mirrors what Claude Code surfaces in its tasks list: title + kind + + status + duration + token usage. + + Cost / cache / payload size are deferred to a later cut — v1 ships + only what GitPilot can compute honestly across every supported + provider. + """ + id: str = field(default_factory=lambda: uuid.uuid4().hex) + kind: str = "plan" # plan | execute | (future: explore, code_write…) + title: str = "" + status: str = "running" # running | completed | failed + started_at: str = field( + default_factory=lambda: datetime.now(UTC).isoformat(), + ) + completed_at: str | None = None + duration_ms: int | None = None + prompt_tokens: int | None = None + completion_tokens: int | None = None + error: str | None = None + metadata: dict[str, Any] = field(default_factory=dict) + + @dataclass class Session: id: str = field(default_factory=lambda: uuid.uuid4().hex[:16]) @@ -70,6 +99,11 @@ class Session: repos: list[dict[str, Any]] = field(default_factory=list) active_repo: str | None = None # full_name of the write-target repo + # Right-sidebar Tasks panel (Claude-Code-style trace of every AI + # invocation in this session). Append-only. Backwards-compatible + # default for sessions that pre-date this field. + tasks: list[Task] = field(default_factory=list) + def add_message(self, role: str, content: str, **meta): self.messages.append(Message(role=role, content=content, metadata=meta)) self.updated_at = datetime.now(UTC).isoformat() @@ -82,6 +116,9 @@ def from_dict(cls, data: dict[str, Any]) -> Session: data = dict(data) # shallow copy data["messages"] = [Message(**m) for m in data.get("messages", [])] data["checkpoints"] = [Checkpoint(**c) for c in data.get("checkpoints", [])] + # Backwards-compatible: sessions saved before the tasks field + # existed simply load with an empty list. + data["tasks"] = [Task(**t) for t in data.get("tasks", [])] # Backwards-compatible migration: populate repos from legacy single-repo if not data.get("repos") and data.get("repo_full_name"): diff --git a/gitpilot/settings.py b/gitpilot/settings.py index 66cccc5..96252b2 100644 --- a/gitpilot/settings.py +++ b/gitpilot/settings.py @@ -64,6 +64,31 @@ class OllaBridgeConfig(BaseModel): api_key: str = Field(default="") # Optional: for authenticated endpoints +class SandboxSettings(BaseModel): + """Where code/commands generated by GitPilot run. + + ``subprocess`` is the safe local default — host subprocess with a cwd jail, + secret-scrubbing, and the destructive-pattern denylist from + :mod:`gitpilot.sandbox`. Switch to ``matrixlab`` to delegate execution to + a MatrixLab Runner (containerised, ephemeral, resource-limited) for + enterprise-grade isolation. ``off`` is the pass-through backend + (:class:`gitpilot.sandbox.NullSandbox`) — same as subprocess but without + the cwd jail; intended for local dev only. + + Persisted fields mirror the env vars the sandbox module already honours + (``GITPILOT_SANDBOX``, ``GITPILOT_MATRIXLAB_URL``, ...). Env vars still + take precedence at sandbox-resolution time so deployments can override + user settings without touching disk. + """ + + backend: str = Field(default="subprocess") + matrixlab_url: str = Field(default="http://localhost:8000") + matrixlab_token: str = Field(default="") + matrixlab_image: str = Field(default="") + allow_network: bool = Field(default=False) + timeout_sec: int = Field(default=120) + + class AppSettings(BaseModel): provider: LLMProvider = Field(default=LLMProvider.ollabridge) @@ -73,6 +98,13 @@ class AppSettings(BaseModel): ollama: OllamaConfig = Field(default_factory=OllamaConfig) ollabridge: OllaBridgeConfig = Field(default_factory=OllaBridgeConfig) + # Sandbox runtime for "Run code" actions in the chat UI. Defaults to a + # local subprocess so trying simple snippets works out of the box; switch + # to MatrixLab from the Settings modal when an enterprise-grade isolated + # runner is needed. See :class:`SandboxSettings` for the field shape and + # :mod:`gitpilot.sandbox` for the resolution precedence. + sandbox: SandboxSettings = Field(default_factory=SandboxSettings) + # Lite Mode: optimized for small LLMs (< 7B parameters). # Uses simplified prompts, single-agent execution, and pre-fetched context # instead of multi-agent pipelines with tool-calling. @@ -149,6 +181,18 @@ def from_disk(cls) -> AppSettings: if os.getenv("GITPILOT_LANGFLOW_PLAN_FLOW_ID"): settings.langflow_plan_flow_id = os.getenv("GITPILOT_LANGFLOW_PLAN_FLOW_ID") + # Sandbox runtime — env always wins (same precedence the runtime + # resolution in :mod:`gitpilot.sandbox` already enforces), so an + # operator can pin the backend without editing settings.json. + if os.getenv("GITPILOT_SANDBOX"): + settings.sandbox.backend = os.environ["GITPILOT_SANDBOX"] + if os.getenv("GITPILOT_MATRIXLAB_URL"): + settings.sandbox.matrixlab_url = os.environ["GITPILOT_MATRIXLAB_URL"] + if os.getenv("GITPILOT_MATRIXLAB_TOKEN"): + settings.sandbox.matrixlab_token = os.environ["GITPILOT_MATRIXLAB_TOKEN"] + if os.getenv("GITPILOT_MATRIXLAB_IMAGE"): + settings.sandbox.matrixlab_image = os.environ["GITPILOT_MATRIXLAB_IMAGE"] + # Lite mode may be intentionally controlled by env in CI or deployments. env_lite = os.getenv("GITPILOT_LITE_MODE", "").strip().lower() if env_lite in ("1", "true", "yes", "on"): @@ -463,6 +507,10 @@ def update_settings(updates: dict[str, Any]) -> AppSettings: merged = _merge_model_config(_settings.ollabridge, updates["ollabridge"]) _settings.ollabridge = OllaBridgeConfig(**merged) + if "sandbox" in updates: + merged = _merge_model_config(_settings.sandbox, updates["sandbox"]) + _settings.sandbox = SandboxSettings(**merged) + if "lite_mode" in updates: _settings.lite_mode = bool(updates["lite_mode"]) diff --git a/gitpilot/task_recorder.py b/gitpilot/task_recorder.py new file mode 100644 index 0000000..01a6e85 --- /dev/null +++ b/gitpilot/task_recorder.py @@ -0,0 +1,118 @@ +"""Right-sidebar Tasks panel — recorder helpers. + +Implements the smallest possible contract that lets the chat UI trace +every user-facing AI invocation (Plan, Execute) the way Claude Code's +right-pane tasks list does. + +Design notes: + +* **Append-only.** Once a task lands in ``Session.tasks`` it is + mutated exactly once (on completion) and never edited again. This + matches the audit-trail philosophy of the rest of the session + format. +* **Endpoint-level wrap, not deep in agentic.py.** ``begin_task`` is + called at the start of an endpoint, ``finish_task`` in its finally — + the agent stack itself is untouched, so the cut is trivially + revertible. +* **Best-effort persistence.** A failure to write the task back to + disk must never block the user-facing endpoint — the agent already + ran, the user already has their result. We log and move on. +* **Flag-gated.** When ``tasks_sidebar`` is off, ``begin_task`` + returns ``None`` and ``finish_task`` is a no-op so the backend is + byte-identical to today. +""" +from __future__ import annotations + +import logging +from datetime import UTC, datetime +from time import perf_counter +from typing import Optional + +from . import flags +from .session import SessionManager, Task + +logger = logging.getLogger(__name__) + +FLAG_TASKS_SIDEBAR = "tasks_sidebar" + + +def begin_task( + session_mgr: SessionManager, + session_id: Optional[str], + *, + kind: str, + title: str, +) -> Optional[Task]: + """Append a ``running`` Task to the session and persist. + + Returns the in-flight Task so the caller can pass it back to + :func:`finish_task` later. Returns ``None`` when recording is + disabled or the session can't be loaded — callers must tolerate + the absent task gracefully. + """ + if not flags.is_on(FLAG_TASKS_SIDEBAR, default=True): + return None + if not session_id: + return None + try: + session = session_mgr.load(session_id) + except Exception as exc: + logger.debug("[tasks] session %s not loadable: %s", session_id, exc) + return None + + task = Task(kind=kind, title=title, status="running") + # Attach a perf-counter start tick on the in-memory object so + # finish_task can compute duration_ms without scanning timestamps. + task.metadata["_perf_t0"] = perf_counter() + session.tasks.append(task) + try: + session_mgr.save(session) + except Exception as exc: # pragma: no cover - defensive + logger.debug("[tasks] could not persist initial task: %s", exc) + return task + + +def finish_task( + session_mgr: SessionManager, + session_id: Optional[str], + task: Optional[Task], + *, + status: str = "completed", + error: Optional[str] = None, + prompt_tokens: Optional[int] = None, + completion_tokens: Optional[int] = None, +) -> None: + """Mark a previously-begun task as completed/failed and persist. + + Idempotent: safe to call with ``task=None`` (when ``begin_task`` + returned None because the flag was off or the session was missing). + """ + if task is None or not session_id: + return + t0 = task.metadata.pop("_perf_t0", None) + if isinstance(t0, (int, float)): + task.duration_ms = int((perf_counter() - t0) * 1000) + task.status = status + if error is not None: + task.error = error[:500] + if prompt_tokens is not None: + task.prompt_tokens = prompt_tokens + if completion_tokens is not None: + task.completion_tokens = completion_tokens + task.completed_at = datetime.now(UTC).isoformat() + + # Reload the session before saving so we don't clobber any writes + # the agent stack made in the meantime (e.g. branch persistence on + # execute) — the running-task entry was already there, we only need + # to swap its final state in. + try: + fresh = session_mgr.load(session_id) + for i, existing in enumerate(fresh.tasks): + if existing.id == task.id: + fresh.tasks[i] = task + break + else: + fresh.tasks.append(task) + session_mgr.save(fresh) + except Exception as exc: # pragma: no cover - defensive + logger.debug("[tasks] could not persist completed task: %s", exc) diff --git a/mypy.ini b/mypy.ini index 9904baa..0bcbcf3 100644 --- a/mypy.ini +++ b/mypy.ini @@ -46,7 +46,22 @@ files = gitpilot/init_wizard.py, gitpilot/_deprecation.py, gitpilot/plan_guards.py, - gitpilot/context_meter.py + gitpilot/context_meter.py, + gitpilot/task_recorder.py, + gitpilot/grep_backend.py, + gitpilot/auto_compact.py, + gitpilot/explorer_summary.py, + gitpilot/repo_map.py, + gitpilot/rag/__init__.py, + gitpilot/rag/chunker.py, + gitpilot/rag/embedder.py, + gitpilot/rag/indexer.py, + gitpilot/rag/retriever.py, + gitpilot/rag/store.py, + gitpilot/edit_backend.py, + gitpilot/rag_consent.py, + gitpilot/query_router.py, + gitpilot/agent_prompts.py # The minimal in-tree skill front-matter parser uses dynamic typing for # its returned dict; keep it permissive without weakening the gate. diff --git a/tests/test_agent_prompts.py b/tests/test_agent_prompts.py new file mode 100644 index 0000000..ff91b7d --- /dev/null +++ b/tests/test_agent_prompts.py @@ -0,0 +1,243 @@ +"""Tests for the lean-prompt module (Batch B12). + +Pin five properties so a future "let me add one more rule" edit can't +silently regress the small-model context budget: + +1. Every prompt is within its declared character budget. +2. None of the forbidden small-model keywords appear anywhere in the + rendered prompts. +3. The "Known facts" block is in the bottom 250 chars of every + render_plan_task() output (last-segment-attention principle). +4. The intent-routed rule block is the correct one for each intent + (create / modify / fix / delete / find / info / unknown / None). +5. The flag can be toggled without breaking imports. +""" +from __future__ import annotations + +import pytest + +from gitpilot import flags +from gitpilot.agent_prompts import ( + CODE_WRITER_BACKSTORY, + CODE_WRITER_BACKSTORY_BUDGET, + CREATE_FILE_TASK_CHAR_BUDGET, + EXPLORER_BACKSTORY, + EXPLORER_BACKSTORY_BUDGET, + EXPLORER_TASK_CHAR_BUDGET, + FLAG_LEAN_PROMPTS, + FORBIDDEN_KEYWORDS, + MODIFY_FILE_TASK_CHAR_BUDGET, + PLAN_TASK_CHAR_BUDGET, + PLANNER_BACKSTORY, + PLANNER_BACKSTORY_BUDGET, + SPECIALIST_BACKSTORIES, + SPECIALIST_BACKSTORY_BUDGET, + lean_prompts_enabled, + render_create_file_task, + render_explorer_task, + render_modify_file_task, + render_plan_task, +) + + +# ---------------------------------------------------------------------- +# Per-prompt budget enforcement +# ---------------------------------------------------------------------- + +SAMPLE_FILE_LIST = ["README.md", "src/main.py", "src/util.py", "tests/test_main.py"] + + +@pytest.mark.parametrize( + "intent", + ["create", "modify", "fix", "delete", "find", "info", "unknown", None], +) +def test_plan_task_within_budget_for_every_intent(intent: str | None) -> None: + rendered = render_plan_task( + goal="do something specific that takes a few words to describe", + repo_full_name="owner/repo-with-a-longish-name", + active_ref="some/branch-name-that-is-longer", + file_list=SAMPLE_FILE_LIST, + intent=intent, + ) + assert len(rendered) <= PLAN_TASK_CHAR_BUDGET, ( + f"intent={intent}: {len(rendered)} > {PLAN_TASK_CHAR_BUDGET}" + ) + + +def test_explorer_task_within_budget() -> None: + rendered = render_explorer_task( + repo_full_name="owner/repo", active_ref="main", + ) + assert len(rendered) <= EXPLORER_TASK_CHAR_BUDGET + + +def test_create_file_task_within_budget() -> None: + rendered = render_create_file_task( + file_path="src/very/deep/path/file.py", + goal="generate a thing", + step_description="step that does the thing", + ) + assert len(rendered) <= CREATE_FILE_TASK_CHAR_BUDGET + + +def test_modify_file_task_within_budget() -> None: + rendered = render_modify_file_task( + file_path="src/util.py", + goal="fix the bug", + step_description="patch the validator", + current_content="def x():\n return 1\n", + ) + # The content varies; we cap only the framing. Subtract the + # length of the current content to test just the rules + format. + framing = len(rendered) - len("def x():\n return 1\n") + assert framing <= MODIFY_FILE_TASK_CHAR_BUDGET + + +def test_backstories_within_budget() -> None: + assert len(EXPLORER_BACKSTORY) <= EXPLORER_BACKSTORY_BUDGET + assert len(PLANNER_BACKSTORY) <= PLANNER_BACKSTORY_BUDGET + assert len(CODE_WRITER_BACKSTORY) <= CODE_WRITER_BACKSTORY_BUDGET + for name, body in SPECIALIST_BACKSTORIES.items(): + assert len(body) <= SPECIALIST_BACKSTORY_BUDGET, name + + +# ---------------------------------------------------------------------- +# Forbidden-keyword scrub +# ---------------------------------------------------------------------- + +def _all_rendered_prompts() -> str: + """Every prompt the lean module produces, concatenated. Used as + the corpus for forbidden-keyword greps.""" + parts: list[str] = [ + EXPLORER_BACKSTORY, PLANNER_BACKSTORY, CODE_WRITER_BACKSTORY, + *SPECIALIST_BACKSTORIES.values(), + render_explorer_task(repo_full_name="o/r", active_ref="m"), + render_create_file_task(file_path="a.py", goal="g", step_description="d"), + render_modify_file_task( + file_path="a.py", goal="g", step_description="d", current_content="", + ), + ] + for intent in ( + "create", "modify", "fix", "delete", "find", "info", "unknown", None, + ): + parts.append( + render_plan_task( + goal="g", repo_full_name="o/r", active_ref="m", + file_list=["x.py"], intent=intent, + ) + ) + return "\n".join(parts) + + +@pytest.mark.parametrize("keyword", FORBIDDEN_KEYWORDS) +def test_no_forbidden_keyword_in_any_rendered_prompt(keyword: str) -> None: + corpus = _all_rendered_prompts() + assert keyword not in corpus, ( + f"Forbidden keyword {keyword!r} still appears in a rendered prompt" + ) + + +# ---------------------------------------------------------------------- +# Facts block is at the bottom +# ---------------------------------------------------------------------- + +def test_facts_block_lives_near_end_of_plan_task() -> None: + """Small models over-weight the final segment of the prompt. The + "Known facts" block must live in the last 250 chars so the file- + list ground truth gets that attention.""" + rendered = render_plan_task( + goal="do thing", + repo_full_name="o/r", + active_ref="main", + file_list=["README.md"], + intent="create", + ) + tail = rendered[-300:] + assert "Known facts:" in tail + assert "does NOT exist" in tail + + +# ---------------------------------------------------------------------- +# Intent → rule block routing +# ---------------------------------------------------------------------- + +_INTENT_RULE_MARKERS = { + "create": "at least one CREATE", + "modify": "Use MODIFY only", + "fix": "Use MODIFY only", # fix aliases to modify + "delete": "Use DELETE only", + "find": "Plan READ actions", + "info": "Empty steps is fine", + "unknown": "Match the action to what", +} + + +@pytest.mark.parametrize("intent, marker", list(_INTENT_RULE_MARKERS.items())) +def test_each_intent_pulls_its_own_rules(intent: str, marker: str) -> None: + rendered = render_plan_task( + goal="x", repo_full_name="o/r", active_ref="m", + file_list=["a.py"], intent=intent, + ) + assert marker in rendered, f"intent={intent} missing marker {marker!r}" + + +def test_no_intent_falls_back_to_unknown_block() -> None: + """When intent is None (router skipped / disabled) we don't want a + crash — pick the generic rule block.""" + rendered = render_plan_task( + goal="x", repo_full_name="o/r", active_ref="m", + file_list=["a.py"], intent=None, + ) + assert _INTENT_RULE_MARKERS["unknown"] in rendered + + +def test_create_intent_does_not_carry_delete_rules() -> None: + """The whole point of intent routing — small models stop seeing + the deletion rule block when the goal isn't a deletion.""" + rendered = render_plan_task( + goal="g", repo_full_name="o/r", active_ref="m", + file_list=["a.py"], intent="create", + ) + assert "Use DELETE only" not in rendered + assert "Use MODIFY only" not in rendered + + +# ---------------------------------------------------------------------- +# Flag plumbing +# ---------------------------------------------------------------------- + +def test_flag_default_on() -> None: + assert lean_prompts_enabled() is True + + +def test_flag_can_be_turned_off() -> None: + flags.set_override(FLAG_LEAN_PROMPTS, False) + try: + assert lean_prompts_enabled() is False + finally: + flags.clear_override(FLAG_LEAN_PROMPTS) + + +# ---------------------------------------------------------------------- +# Total prompt-stack budget for the canonical failure scenario +# ---------------------------------------------------------------------- + +def test_total_planner_stack_under_3k_chars_on_tiny_repo() -> None: + """The original llama3:8b failure trace happened with a planner + stack of ~4.5k chars (12-15 KB including tool schemas). After + B12 the stack — backstory + task description — fits in 3 KB on + a single-file repo, leaving room for the tool-schema preamble + inside an 8 k context window.""" + stack = ( + PLANNER_BACKSTORY + + render_plan_task( + goal="create a simple python code about what says the README.md", + repo_full_name="INFN-GE/Nuclear-Physics", + active_ref="master", + file_list=["README.md"], + intent="create", + ) + ) + assert len(stack) < 3000, ( + f"planner stack is {len(stack)} chars — small-model budget regression" + ) diff --git a/tests/test_agent_tools_contract.py b/tests/test_agent_tools_contract.py new file mode 100644 index 0000000..590f0b8 --- /dev/null +++ b/tests/test_agent_tools_contract.py @@ -0,0 +1,50 @@ +"""Regression tests for the default CrewAI repository tool contract.""" +from __future__ import annotations + +import ast +from pathlib import Path + +AGENT_TOOLS = Path(__file__).resolve().parents[1] / "gitpilot" / "agent_tools.py" + + +def _module() -> ast.Module: + return ast.parse(AGENT_TOOLS.read_text()) + + +def _function(name: str) -> ast.FunctionDef: + for node in _module().body: + if isinstance(node, ast.FunctionDef) and node.name == name: + return node + raise AssertionError(f"function {name!r} not found") + + +def test_primary_read_tool_keeps_single_argument_schema() -> None: + """Keep the common read tool simple for smaller ReAct models.""" + read_file = _function("read_file") + + assert [arg.arg for arg in read_file.args.args] == ["file_path"] + assert read_file.args.defaults == [] + + +def test_default_repository_tools_use_stable_explorer_surface() -> None: + """The explorer's default tools should match the pre-B1 safe set.""" + module = _module() + assignments = [ + node + for node in module.body + if isinstance(node, ast.Assign) + and any( + isinstance(target, ast.Name) and target.id == "REPOSITORY_TOOLS" + for target in node.targets + ) + ] + assert assignments, "REPOSITORY_TOOLS assignment not found" + + value = assignments[-1].value + assert isinstance(value, ast.List) + assert [elt.id for elt in value.elts if isinstance(elt, ast.Name)] == [ + "list_repository_files", + "get_directory_structure", + "read_file", + "get_repository_summary", + ] diff --git a/tests/test_auto_compact.py b/tests/test_auto_compact.py new file mode 100644 index 0000000..1c3e6ec --- /dev/null +++ b/tests/test_auto_compact.py @@ -0,0 +1,172 @@ +"""Tests for the auto-compaction hook (Batch B3). + +Pin three things: + +* below threshold → no-op (we don't fold prematurely) +* above threshold → fold older non-essential turns into a single + summary system message, keep the last N recent turns +* idempotency — running compaction twice on already-compacted history + doesn't fold the summary into another summary +""" +from __future__ import annotations + +import pytest + +from gitpilot import api as api_module +from gitpilot import flags +from gitpilot.auto_compact import ( + COMPACTED_FLAG, + DEFAULT_KEEP_RECENT_TURNS, + FLAG_AUTO_COMPACT, + SUMMARY_LABEL, + maybe_compact_session, +) +from gitpilot.context_budget import estimate_tokens +from gitpilot.session import Message + + +def _make_session_with_history(messages: list[tuple[str, str]]): + """Build and save a session with the supplied (role, content) pairs.""" + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="compact-test" + ) + for role, content in messages: + session.messages.append(Message(role=role, content=content)) + api_module._session_mgr.save(session) + return session + + +def test_no_op_below_threshold() -> None: + """A handful of short messages must not trigger compaction even on + an 8 k window — that would be ridiculous.""" + session = _make_session_with_history([ + ("user", "do thing"), + ("assistant", "ok"), + ("user", "thanks"), + ]) + report = maybe_compact_session( + api_module._session_mgr, session.id, context_window=8_192 + ) + assert report.compacted is False + reloaded = api_module._session_mgr.load(session.id) + assert len(reloaded.messages) == 3 # untouched + + +def test_folds_above_threshold_and_preserves_recent_turns() -> None: + """A long history must fold older non-essential turns into a + single summary, keeping the last N recent messages verbatim.""" + chunk = "lorem ipsum " * 200 # ~400 tokens per message under heuristic + messages = [("user" if i % 2 == 0 else "assistant", chunk) for i in range(20)] + session = _make_session_with_history(messages) + + report = maybe_compact_session( + api_module._session_mgr, session.id, context_window=8_192 + ) + assert report.compacted is True + assert report.before_tokens > report.after_tokens + assert report.messages_folded >= 1 + + reloaded = api_module._session_mgr.load(session.id) + # Recent turns preserved exactly. + assert len(reloaded.messages) == 1 + DEFAULT_KEEP_RECENT_TURNS + summary = reloaded.messages[0] + assert summary.role == "system" + assert SUMMARY_LABEL in summary.content + assert summary.metadata.get(COMPACTED_FLAG) == "1" + # Last N recent turns are still verbatim. + for m in reloaded.messages[1:]: + assert m.content == chunk + + +def test_idempotent_no_recompact_of_summary() -> None: + """Running compaction twice must not fold the summary itself.""" + chunk = "lorem ipsum " * 200 + messages = [("user" if i % 2 == 0 else "assistant", chunk) for i in range(20)] + session = _make_session_with_history(messages) + + first = maybe_compact_session( + api_module._session_mgr, session.id, context_window=8_192 + ) + assert first.compacted is True + + second = maybe_compact_session( + api_module._session_mgr, session.id, context_window=8_192 + ) + # Either: stayed below threshold post-fold (compacted=False), OR + # found nothing new to fold (compacted=False with that reason). + # Either way: no further folding of an already-summary entry. + assert second.compacted is False + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.messages[0].metadata.get(COMPACTED_FLAG) == "1" + + +def test_flag_off_is_a_noop() -> None: + session = _make_session_with_history([ + ("user", "x" * 8000) for _ in range(20) + ]) + flags.set_override(FLAG_AUTO_COMPACT, False) + try: + report = maybe_compact_session( + api_module._session_mgr, session.id, context_window=8_192 + ) + finally: + flags.clear_override(FLAG_AUTO_COMPACT) + assert report.compacted is False + assert report.reason == "flag off" + + +def test_missing_session_returns_clean_report() -> None: + report = maybe_compact_session( + api_module._session_mgr, "does-not-exist", context_window=8_192 + ) + assert report.compacted is False + + +def test_no_session_id_is_noop() -> None: + report = maybe_compact_session( + api_module._session_mgr, None, context_window=8_192 + ) + assert report.compacted is False + + +def test_reserved_response_lowers_effective_budget() -> None: + """A bigger reserved_response should make compaction fire SOONER — + less effective budget means the threshold is hit earlier.""" + chunk = "lorem ipsum " * 100 # ~200 tok each + msgs = [("user" if i % 2 == 0 else "assistant", chunk) for i in range(20)] + + s1 = _make_session_with_history(msgs) + s2 = _make_session_with_history(msgs) + + # Low reservation: budget is bigger → less likely to fire. + r_low = maybe_compact_session( + api_module._session_mgr, s1.id, context_window=8_192, + reserved_response=0, + ) + # High reservation: budget is tight → much more likely to fire. + r_high = maybe_compact_session( + api_module._session_mgr, s2.id, context_window=8_192, + reserved_response=6_000, + ) + # The high-reservation run must compact at least as aggressively + # as the low-reservation one. + if r_high.compacted: + assert True # at-least-as-aggressive + else: + # If neither fires, the low one must also not have fired. + assert r_low.compacted is False + + +def test_summary_actually_shrinks_token_total() -> None: + """The whole point of compaction is to free budget.""" + chunk = "lorem ipsum " * 400 # ~800 tok each + session = _make_session_with_history([ + ("user" if i % 2 == 0 else "assistant", chunk) for i in range(20) + ]) + before = sum(estimate_tokens(m.content) for m in session.messages) + report = maybe_compact_session( + api_module._session_mgr, session.id, context_window=8_192 + ) + assert report.compacted is True + assert report.after_tokens < report.before_tokens + assert report.before_tokens == before diff --git a/tests/test_chat_plan_friendly_errors.py b/tests/test_chat_plan_friendly_errors.py index ca1a59d..63425b5 100644 --- a/tests/test_chat_plan_friendly_errors.py +++ b/tests/test_chat_plan_friendly_errors.py @@ -40,10 +40,10 @@ def _mount_failing_planners( """Replace both planner entry points so we can drive the error path deterministically — no LLM calls, no GitHub network.""" - async def _bad_main(goal, repo_full_name, token=None, branch_name=None): + async def _bad_main(goal, repo_full_name, token=None, branch_name=None, **_kw): raise RuntimeError(main_error) - async def _bad_lite(goal, repo_full_name, token=None, branch_name=None): + async def _bad_lite(goal, repo_full_name, token=None, branch_name=None, **_kw): if lite_error is None: return {"goal": goal, "summary": "lite ok", "steps": []} raise RuntimeError(lite_error) @@ -140,7 +140,7 @@ def test_unknown_runtime_error_is_wrapped_as_500_with_detail( def test_planner_success_passes_through( client: TestClient, monkeypatch: pytest.MonkeyPatch, ) -> None: - async def _ok(goal, repo_full_name, token=None, branch_name=None): + async def _ok(goal, repo_full_name, token=None, branch_name=None, **_kw): return {"goal": goal, "summary": "real plan", "steps": []} monkeypatch.setattr(api_module, "generate_plan", _ok) diff --git a/tests/test_edit_backend.py b/tests/test_edit_backend.py new file mode 100644 index 0000000..ed86948 --- /dev/null +++ b/tests/test_edit_backend.py @@ -0,0 +1,327 @@ +"""Tests for the surgical edit backend (Batch B8). + +Pin every safety property the executor relies on: + +* ``apply_edit`` refuses ambiguous matches by default (the contract + that makes Claude Code's ``Edit`` tool reliable across models). +* Zero matches → clear error with a stripped-form hint when the only + difference is indentation. +* Identical old/new → refused so the planner can't accidentally + commit a no-op masquerading as a fix. +* Unified diffs apply by *context match*, so stale line numbers + don't matter. +* Multi-hunk diffs track a running offset so the second hunk lands + in the right place even after the first changed the file's length. +* Multi-file diffs (more than one ``diff --git`` header) are + refused — single file at a time. +* Trailing newline state is preserved in both directions. +* Pathologically big inputs (2 000-line file, one-line edit) apply + in well under a second. +""" +from __future__ import annotations + +import time + +import pytest + +from gitpilot.edit_backend import ( + EditError, + EditReport, + apply_edit, + apply_unified_diff, +) + + +# ---------------------------------------------------------------------- +# apply_edit — happy paths +# ---------------------------------------------------------------------- + +def test_apply_edit_single_match() -> None: + src = "alpha\nbeta\ngamma\n" + new, rpt = apply_edit(src, old_string="beta", new_string="BETA") + assert new == "alpha\nBETA\ngamma\n" + assert rpt.occurrences_replaced == 1 + assert rpt.bytes_before == len(src) + assert rpt.bytes_after == len(new) + + +def test_apply_edit_multiline_with_indentation() -> None: + src = ( + "def foo():\n" + " x = 1\n" + " y = 2\n" + " return x + y\n" + ) + new, rpt = apply_edit( + src, + old_string=" x = 1\n y = 2", + new_string=" x = 10\n y = 20", + ) + assert "x = 10" in new and "y = 20" in new + assert rpt.occurrences_replaced == 1 + + +def test_apply_edit_deletes_with_empty_new_string() -> None: + src = "keep\nremove\nkeep\n" + new, rpt = apply_edit(src, old_string="remove\n", new_string="") + assert new == "keep\nkeep\n" + assert rpt.occurrences_replaced == 1 + + +def test_apply_edit_expected_occurrences_n() -> None: + src = "a\na\nb\n" + new, rpt = apply_edit(src, old_string="a", new_string="X", expected_occurrences=2) + assert new == "X\nX\nb\n" + assert rpt.occurrences_replaced == 2 + + +def test_apply_edit_expected_minus_one_replace_all() -> None: + src = "x\nx\nx\nx\n" + new, rpt = apply_edit(src, old_string="x", new_string="Y", expected_occurrences=-1) + assert new == "Y\nY\nY\nY\n" + assert rpt.occurrences_replaced == 4 + + +# ---------------------------------------------------------------------- +# apply_edit — refusal paths +# ---------------------------------------------------------------------- + +def test_apply_edit_refuses_zero_matches() -> None: + with pytest.raises(EditError, match="not found"): + apply_edit("a\nb\nc\n", old_string="DOESNOTEXIST", new_string="X") + + +def test_apply_edit_zero_match_hint_on_indentation_mismatch() -> None: + """Most common cause of a missing match in Python: the agent + copied the line WITH the wrong leading indentation. The error + must hint at that so the agent can recover. + + Here the file uses 8 spaces but the agent's edit uses 4 — the + bare substring isn't in the file, but stripped(old) is. + """ + # File uses spaces for indentation; agent's edit accidentally + # used a tab — substring no longer matches, but stripped form does. + src = "def foo():\n return value\n" + try: + apply_edit( + src, + old_string="\treturn value", + new_string="\treturn new_value", + ) + except EditError as e: + assert "indentation" in str(e).lower() + else: + pytest.fail("expected EditError") + + +def test_apply_edit_refuses_ambiguous_match() -> None: + with pytest.raises(EditError, match="occurs 3 time"): + apply_edit("a\na\na\nb\n", old_string="a", new_string="X") + + +def test_apply_edit_refuses_identical_old_new() -> None: + with pytest.raises(EditError, match="identical"): + apply_edit("x\n", old_string="x", new_string="x") + + +def test_apply_edit_refuses_empty_old_string() -> None: + with pytest.raises(EditError, match="empty"): + apply_edit("x\n", old_string="", new_string="y") + + +def test_apply_edit_unexpected_count_at_expected_2_but_3_present() -> None: + with pytest.raises(EditError, match="occurs 3 time"): + apply_edit("a\na\na\n", old_string="a", new_string="X", expected_occurrences=2) + + +# ---------------------------------------------------------------------- +# apply_edit — performance on big files +# ---------------------------------------------------------------------- + +def test_apply_edit_on_2000_line_file_is_fast() -> None: + big = "\n".join(f"line {i}" for i in range(1, 2001)) + "\n" + t0 = time.perf_counter() + new, rpt = apply_edit(big, old_string="line 1482", new_string="line FIXED") + elapsed = time.perf_counter() - t0 + assert rpt.occurrences_replaced == 1 + assert "line FIXED" in new + assert "line 1481" in new and "line 1483" in new + # Must be near-instant; if this ever drifts to seconds the algo + # has regressed. + assert elapsed < 0.5 + + +# ---------------------------------------------------------------------- +# apply_unified_diff — happy paths +# ---------------------------------------------------------------------- + +def test_apply_unified_diff_single_hunk() -> None: + content = "line A\nline B\nline C\nline D\n" + diff = ( + "@@ -1,3 +1,3 @@\n" + " line A\n" + "-line B\n" + "+line BB\n" + " line C\n" + ) + new, rpt = apply_unified_diff(content, diff) + assert new == "line A\nline BB\nline C\nline D\n" + assert rpt.occurrences_replaced == 1 + + +def test_apply_unified_diff_multiple_hunks_with_offset_tracking() -> None: + content = ( + "header\n" + "alpha\n" + "beta\n" + "gamma\n" + "delta\n" + "epsilon\n" + "footer\n" + ) + diff = ( + "@@ -1,3 +1,4 @@\n" + " header\n" + " alpha\n" + "+inserted-1\n" + " beta\n" + "@@ -5,3 +6,3 @@\n" + " delta\n" + "-epsilon\n" + "+EPSILON\n" + " footer\n" + ) + new, rpt = apply_unified_diff(content, diff) + assert "inserted-1" in new + assert "EPSILON" in new + assert rpt.occurrences_replaced == 2 + + +def test_apply_unified_diff_tolerates_stale_line_numbers() -> None: + """The whole point of context-match: if some earlier edit moved + the target lines, the hunk header's line numbers are wrong, but + the context is still correct. We match by context, not numbers.""" + content = "line A\nline B\nline C\n" + diff = ( + "@@ -999,2 +999,2 @@\n" + " line A\n" + "-line B\n" + "+line BB\n" + ) + new, rpt = apply_unified_diff(content, diff) + assert new == "line A\nline BB\nline C\n" + + +def test_apply_unified_diff_picks_nearest_match_when_ambiguous() -> None: + """When the context appears twice, we land near the line number + the hunk advertised.""" + content = ( + "X\n" # 1 + "Y\n" # 2 + "X\n" # 3 + "Y\n" # 4 + "X\n" # 5 + "Y\n" # 6 + "X\n" # 7 + ) + diff = ( + "@@ -5,3 +5,3 @@\n" + " X\n" + "-Y\n" + "+Z\n" + " X\n" + ) + new, _ = apply_unified_diff(content, diff) + # The hunk targeted lines 5-7 ("X Y X" near line 5), so the + # third occurrence wins. Lines 5,6,7 become X,Z,X — earlier + # occurrences are untouched. + assert new.splitlines() == ["X", "Y", "X", "Y", "X", "Z", "X"] + + +def test_apply_unified_diff_with_file_headers_ignored() -> None: + """Real-world diffs from git often come with --- / +++ headers + before the @@ hunks; we must ignore the preamble.""" + content = "line A\nline B\n" + diff = ( + "--- a/foo.py\n" + "+++ b/foo.py\n" + "@@ -1,2 +1,2 @@\n" + " line A\n" + "-line B\n" + "+line BB\n" + ) + new, _ = apply_unified_diff(content, diff) + assert new == "line A\nline BB\n" + + +def test_apply_unified_diff_preserves_trailing_newline() -> None: + """No-newline-at-EOF state must be preserved.""" + a = "x\ny\nz" # no trailing newline + diff = "@@ -1,3 +1,3 @@\n x\n-y\n+Y\n z\n" + new, _ = apply_unified_diff(a, diff) + assert not new.endswith("\n") + + +# ---------------------------------------------------------------------- +# apply_unified_diff — refusal paths +# ---------------------------------------------------------------------- + +def test_apply_unified_diff_refuses_missing_context() -> None: + content = "alpha\nbeta\n" + diff = ( + "@@ -1,2 +1,2 @@\n" + " WRONG_CONTEXT\n" + "-beta\n" + "+BETA\n" + ) + with pytest.raises(EditError, match="locate the hunk"): + apply_unified_diff(content, diff) + + +def test_apply_unified_diff_refuses_empty_diff() -> None: + with pytest.raises(EditError, match="empty"): + apply_unified_diff("x\n", "") + + +def test_apply_unified_diff_refuses_multi_file_patch() -> None: + """Two ``diff --git`` headers → caller must split the patch.""" + diff = ( + "diff --git a/x b/x\n" + "@@ -1,1 +1,1 @@\n" + "-x\n" + "+y\n" + "diff --git a/y b/y\n" + "@@ -1,1 +1,1 @@\n" + "-a\n" + "+b\n" + ) + with pytest.raises(EditError, match="multi-file"): + apply_unified_diff("x\n", diff) + + +def test_apply_unified_diff_refuses_malformed_hunk_line() -> None: + diff = ( + "@@ -1,2 +1,2 @@\n" + " alpha\n" + "GARBAGE_LINE_NO_PREFIX\n" + "-beta\n" + "+BETA\n" + ) + with pytest.raises(EditError, match="malformed hunk"): + apply_unified_diff("alpha\nbeta\n", diff) + + +def test_apply_unified_diff_refuses_no_hunks() -> None: + diff = "--- a/foo\n+++ b/foo\n" + with pytest.raises(EditError, match="no @@ hunks"): + apply_unified_diff("x\n", diff) + + +# ---------------------------------------------------------------------- +# EditReport shape +# ---------------------------------------------------------------------- + +def test_edit_report_is_frozen_dataclass() -> None: + rpt = EditReport(occurrences_replaced=1, bytes_before=10, bytes_after=12) + with pytest.raises(Exception): + rpt.occurrences_replaced = 99 # type: ignore[misc] diff --git a/tests/test_execute_persists_session_branch.py b/tests/test_execute_persists_session_branch.py new file mode 100644 index 0000000..cc09d27 --- /dev/null +++ b/tests/test_execute_persists_session_branch.py @@ -0,0 +1,219 @@ +"""Regression tests for session-branch persistence on /api/chat/execute. + +Bug: + A session was created on ``master``, the user approved a plan, the + executor created a fresh ``gitpilot--`` branch and pushed + to it. ``Session.branch`` was *never updated* to that new branch — + so reopening the session next day jumped back to ``master`` and the + user couldn't find their work. + +Fix: + ``/api/chat/execute`` now accepts an optional ``session_id``. When + supplied AND the executor returns a branch name, the endpoint loads + the session, writes the new branch onto both ``session.branch`` and + (if present) the matching ``session.repos[i].branch``, and saves. + +These tests pin every branch of that contract. +""" +from __future__ import annotations + +from typing import Any, Iterator + +import pytest +from fastapi.testclient import TestClient + +from gitpilot import api as api_module + + +@pytest.fixture() +def client() -> Iterator[TestClient]: + yield TestClient(api_module.app) + + +def _stub_executor(branch: str) -> None: + """Replace the real execute path with a deterministic stub that + returns a fixed branch and step count.""" + + async def _fake_execute(plan, repo_full_name, token=None, branch_name=None): + return { + "status": "completed", + "message": "ok", + "branch": branch, + "executionLog": [{"step": 1, "title": "noop"}], + } + + api_module.execute_plan = _fake_execute # type: ignore[assignment] + api_module.execute_plan_lite = _fake_execute # type: ignore[assignment] + + +def _stub_auth_and_context(monkeypatch: pytest.MonkeyPatch) -> None: + """Bypass the GitHub-token + execution_context plumbing so the + test doesn't need a network or credentials.""" + from contextlib import contextmanager + + @contextmanager + def _noop_ctx(*_a: Any, **_kw: Any) -> Iterator[None]: + yield + + monkeypatch.setattr(api_module, "execution_context", _noop_ctx) + monkeypatch.setattr(api_module, "get_github_token", lambda *_a, **_kw: None) + monkeypatch.setattr(api_module, "_is_lite_mode_active", lambda: True) + + +def _plan_payload() -> dict: + """Minimum PlanResult-shaped dict the endpoint will accept.""" + return { + "goal": "do thing", + "summary": "noop", + "steps": [], + } + + +# ---------------------------------------------------------------------- +# Branch is written through to the session record +# ---------------------------------------------------------------------- + +def test_execute_persists_new_branch_on_session( + client: TestClient, monkeypatch: pytest.MonkeyPatch, +) -> None: + _stub_auth_and_context(monkeypatch) + _stub_executor(branch="gitpilot-do-thing-123456") + + session = api_module._session_mgr.create( + repo_full_name="owner/repo", branch="master", name="branch-persist" + ) + + resp = client.post( + "/api/chat/execute", + json={ + "repo_owner": "owner", + "repo_name": "repo", + "plan": _plan_payload(), + "branch_name": None, + "session_id": session.id, + }, + ) + assert resp.status_code == 200, resp.text + assert resp.json()["branch"] == "gitpilot-do-thing-123456" + + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.branch == "gitpilot-do-thing-123456", ( + "session.branch must be updated to the branch the executor wrote to" + ) + + +def test_execute_updates_matching_repos_entry( + client: TestClient, monkeypatch: pytest.MonkeyPatch, +) -> None: + """Multi-repo sessions: repos[i].branch for the matching full_name + is updated alongside the legacy session.branch field.""" + _stub_auth_and_context(monkeypatch) + _stub_executor(branch="gitpilot-multi-987654") + + session = api_module._session_mgr.create( + repo_full_name="owner/repo", branch="master", name="multi-repo" + ) + session.repos = [ + {"full_name": "owner/repo", "branch": "master", "mode": "write"}, + {"full_name": "owner/other", "branch": "trunk", "mode": "read"}, + ] + session.active_repo = "owner/repo" + api_module._session_mgr.save(session) + + resp = client.post( + "/api/chat/execute", + json={ + "repo_owner": "owner", + "repo_name": "repo", + "plan": _plan_payload(), + "branch_name": None, + "session_id": session.id, + }, + ) + assert resp.status_code == 200, resp.text + + reloaded = api_module._session_mgr.load(session.id) + write_entry = next(r for r in reloaded.repos if r["full_name"] == "owner/repo") + read_entry = next(r for r in reloaded.repos if r["full_name"] == "owner/other") + assert write_entry["branch"] == "gitpilot-multi-987654" + # Untouched second repo retains its original branch. + assert read_entry["branch"] == "trunk" + + +# ---------------------------------------------------------------------- +# Backwards compatibility: no session_id, no error +# ---------------------------------------------------------------------- + +def test_execute_without_session_id_is_backwards_compatible( + client: TestClient, monkeypatch: pytest.MonkeyPatch, +) -> None: + _stub_auth_and_context(monkeypatch) + _stub_executor(branch="gitpilot-anon-111111") + + resp = client.post( + "/api/chat/execute", + json={ + "repo_owner": "owner", + "repo_name": "repo", + "plan": _plan_payload(), + "branch_name": None, + }, + ) + assert resp.status_code == 200, resp.text + # Result still carries the executor's branch — that's the + # frontend's only signal in the legacy path. + assert resp.json()["branch"] == "gitpilot-anon-111111" + + +def test_execute_unknown_session_id_does_not_500( + client: TestClient, monkeypatch: pytest.MonkeyPatch, +) -> None: + """A stale session id from the frontend cache must not poison the + execute result — the user already has their commit published.""" + _stub_auth_and_context(monkeypatch) + _stub_executor(branch="gitpilot-stale-222222") + + resp = client.post( + "/api/chat/execute", + json={ + "repo_owner": "owner", + "repo_name": "repo", + "plan": _plan_payload(), + "branch_name": None, + "session_id": "does-not-exist", + }, + ) + assert resp.status_code == 200, resp.text + + +# ---------------------------------------------------------------------- +# Sticky mode: caller already on the session branch +# ---------------------------------------------------------------------- + +def test_execute_sticky_mode_keeps_branch( + client: TestClient, monkeypatch: pytest.MonkeyPatch, +) -> None: + """When the request specifies branch_name (sticky mode), the + session must record the same branch — this is what reopening the + session lands on next time.""" + _stub_auth_and_context(monkeypatch) + _stub_executor(branch="gitpilot-sticky-333333") + + session = api_module._session_mgr.create( + repo_full_name="owner/repo", branch="master", name="sticky" + ) + + resp = client.post( + "/api/chat/execute", + json={ + "repo_owner": "owner", + "repo_name": "repo", + "plan": _plan_payload(), + "branch_name": "gitpilot-sticky-333333", + "session_id": session.id, + }, + ) + assert resp.status_code == 200, resp.text + + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.branch == "gitpilot-sticky-333333" diff --git a/tests/test_explorer_summary.py b/tests/test_explorer_summary.py new file mode 100644 index 0000000..9dc12e7 --- /dev/null +++ b/tests/test_explorer_summary.py @@ -0,0 +1,159 @@ +"""Tests for the explorer-report compressor (Batch B5). + +Pin the contract the planner relies on: + +* Header is preserved so the planner's existing prompt template + matches. +* Every file path the explorer listed survives compression OR is + replaced by an honest "…N more files" marker. +* Compression is a strict no-op when: + - the flag is off + - the report is already under budget + - the report is empty +* On a pathologically large input, the compressed output is always + ≤ the budget plus a small slack — never an order of magnitude over. +""" +from __future__ import annotations + +import pytest + +from gitpilot import flags +from gitpilot.context_budget import estimate_tokens +from gitpilot.explorer_summary import ( + DEFAULT_TOKEN_BUDGET, + FLAG_SUBAGENT_EXPLORER, + MAX_FILES_LISTED, + compress_exploration_report, +) + + +SHORT_REPORT = """\ +REPOSITORY EXPLORATION REPORT +============================= + +Files Found: + - README.md + - src/main.py + - tests/test_main.py + +Key Files: + - README.md + +Directory Structure: +README.md +src/main.py +tests/test_main.py + +File Types: md=1, py=2 +""" + + +def _make_large_report(n_files: int) -> str: + """Synthesise an explorer report listing N files under src/.""" + files = [f" - src/mod_{i:04d}/file_{j}.py" for i in range(n_files // 5) for j in range(5)] + return ( + "REPOSITORY EXPLORATION REPORT\n" + "=============================\n" + "\n" + "Files Found:\n" + + "\n".join(files) + + "\n\nKey Files:\n - README.md\n - src/__init__.py\n" + ) + + +# ---------------------------------------------------------------------- +# No-op paths +# ---------------------------------------------------------------------- + +def test_short_report_passes_through_unchanged() -> None: + out, metrics = compress_exploration_report(SHORT_REPORT) + assert out == SHORT_REPORT + assert metrics.reason == "under budget" + assert metrics.compressed_tokens == metrics.original_tokens + + +def test_empty_report_is_returned_as_is() -> None: + out, metrics = compress_exploration_report("") + assert out == "" + assert metrics.reason == "empty report" + + +def test_flag_off_is_a_noop() -> None: + big = _make_large_report(500) + flags.set_override(FLAG_SUBAGENT_EXPLORER, False) + try: + out, metrics = compress_exploration_report(big) + finally: + flags.clear_override(FLAG_SUBAGENT_EXPLORER) + assert out == big + assert metrics.reason == "flag off" + + +# ---------------------------------------------------------------------- +# Actual compression +# ---------------------------------------------------------------------- + +def test_large_report_shrinks_under_budget() -> None: + big = _make_large_report(500) + out, metrics = compress_exploration_report(big, token_budget=800) + # Token count must drop, hard. + assert metrics.original_tokens > metrics.compressed_tokens + # Budget honoured (with a tiny safety slack — the loop trims more + # if we overshoot, so the final result is at most the budget). + assert metrics.compressed_tokens <= 900 + + +def test_compressed_report_preserves_header_for_planner() -> None: + """The planner template injects the report under a fixed header. + Compression must keep that header so the template still works.""" + big = _make_large_report(500) + out, _ = compress_exploration_report(big, token_budget=800) + assert "REPOSITORY EXPLORATION REPORT" in out + assert "Files Found:" in out + + +def test_compression_caps_files_listed() -> None: + big = _make_large_report(500) # 500 files + out, metrics = compress_exploration_report(big, token_budget=800) + # Should never list more than the documented hard cap under the + # "Files Found:" section (which is what the planner counts). + # Filter to lines that look like the synthetic ``mod_XXXX/file_X.py`` + # entries so the Key Files entry (``src/__init__.py``) doesn't + # accidentally inflate the count. + listed = [ln for ln in out.splitlines() if "mod_" in ln] + assert len(listed) <= MAX_FILES_LISTED + assert metrics.truncated is True + assert metrics.files_in_original == 500 + # Honest "N more files" marker present. + assert "more files" in out + + +def test_compression_keeps_first_files() -> None: + """Order should be preserved when we trim — first N files survive + so the planner sees the "earliest discovered" ones (typically the + most important: top-level, then breadth-first by directory).""" + big = _make_large_report(500) + out, _ = compress_exploration_report(big, token_budget=800) + # First file in the synthetic list is src/mod_0000/file_0.py + assert "src/mod_0000/file_0.py" in out + + +def test_compression_preserves_key_files_section() -> None: + """Key Files is the planner's anchor — it must survive even under + aggressive compression.""" + big = _make_large_report(500) + out, _ = compress_exploration_report(big, token_budget=400) + assert "Key Files:" in out + + +def test_pathological_input_does_not_explode() -> None: + """A 10 000-file report should still produce a bounded output. + No quadratic blow-ups, no recursion errors.""" + big = _make_large_report(10_000) + out, metrics = compress_exploration_report(big, token_budget=800) + assert metrics.compressed_tokens <= 1_200 # generous slack + assert "more files" in out + + +def test_default_budget_matches_documented_value() -> None: + assert DEFAULT_TOKEN_BUDGET == 800 diff --git a/tests/test_glob_and_windowed_read.py b/tests/test_glob_and_windowed_read.py new file mode 100644 index 0000000..a672a5b --- /dev/null +++ b/tests/test_glob_and_windowed_read.py @@ -0,0 +1,137 @@ +"""Tests for the Glob / windowed-Read tools (Batch B1). + +Pin both: + +* The pure-Python ``_glob_match`` semantics — `/`-aware, `**`-as-any. + These mirror Claude Code / ripgrep / bash, NOT the looser ``fnmatch`` + default. +* The windowed-Read helpers (``_coerce_int`` and the default/max + constants) so future refactors don't silently widen them. +""" +from __future__ import annotations + +import pytest + +from gitpilot.agent_tools import ( + GLOB_DEFAULT_MAX_RESULTS, + GLOB_HARD_MAX_RESULTS, + READ_DEFAULT_LIMIT, + READ_MAX_LIMIT, + _coerce_int, + _glob_match, + _glob_to_regex, +) + + +PATHS = [ + "README.md", + "LICENSE", + "src/main.py", + "src/util/io.py", + "src/util/__init__.py", + "tests/test_main.py", + "tests/unit/test_util.py", + "docs/intro.md", + "docs/guide/setup.md", + ".github/workflows/ci.yml", +] + + +# ---------------------------------------------------------------------- +# Glob semantics +# ---------------------------------------------------------------------- + +@pytest.mark.parametrize( + "pattern, expected", + [ + ("**/*.py", [ + "src/main.py", + "src/util/__init__.py", + "src/util/io.py", + "tests/test_main.py", + "tests/unit/test_util.py", + ]), + ("src/**/*.py", [ + "src/main.py", + "src/util/__init__.py", + "src/util/io.py", + ]), + ("**/test_*.py", [ + "tests/test_main.py", + "tests/unit/test_util.py", + ]), + ("*.md", ["README.md"]), # top-level only + ("**/*.md", ["README.md", "docs/guide/setup.md", "docs/intro.md"]), + ("docs/*.md", ["docs/intro.md"]), # immediate child of docs + ("docs/**/*.md", ["docs/guide/setup.md", "docs/intro.md"]), + ("README*", ["README.md"]), + ("LICENSE", ["LICENSE"]), + (".github/**/*.yml", [".github/workflows/ci.yml"]), + ("nope/*.py", []), + ], +) +def test_glob_match_matches_expected_paths(pattern: str, expected: list[str]) -> None: + got = sorted(_glob_match(PATHS, pattern)) + assert got == sorted(expected), f"pattern={pattern!r}" + + +def test_star_does_not_cross_slash() -> None: + """`*` must NOT consume `/` — this is the contract that + distinguishes shell glob from fnmatch. ``src/*`` therefore matches + only direct children of ``src``, never nested ones.""" + got = sorted(_glob_match(PATHS, "src/*")) + assert got == ["src/main.py"] + + +def test_double_star_crosses_slash() -> None: + """`**` must consume any number of segments, including zero.""" + rx = _glob_to_regex("**/foo.py") + assert rx.match("foo.py") # zero-segment case + assert rx.match("src/foo.py") + assert rx.match("a/b/c/foo.py") + assert not rx.match("foo.txt") + + +def test_character_class_passes_through() -> None: + rx = _glob_to_regex("test_[ab].py") + assert rx.match("test_a.py") + assert rx.match("test_b.py") + assert not rx.match("test_c.py") + + +def test_question_mark_matches_single_non_slash() -> None: + rx = _glob_to_regex("a?.py") + assert rx.match("ab.py") + assert not rx.match("a/b.py") + assert not rx.match("abc.py") + + +# ---------------------------------------------------------------------- +# Defaults / bounds +# ---------------------------------------------------------------------- + +def test_constants_have_sensible_bounds() -> None: + assert 1 <= GLOB_DEFAULT_MAX_RESULTS <= GLOB_HARD_MAX_RESULTS + assert READ_DEFAULT_LIMIT < READ_MAX_LIMIT + assert READ_MAX_LIMIT <= 100_000 # belt-and-braces ceiling + + +# ---------------------------------------------------------------------- +# Coercion (CrewAI passes ints as strings, sometimes as schema dicts) +# ---------------------------------------------------------------------- + +@pytest.mark.parametrize( + "value, default, expected", + [ + (5, 999, 5), + ("5", 999, 5), + (" 12 ", 999, 12), + (None, 7, 7), + ("nope", 7, 7), + ({"description": "...", "type": "int"}, 7, 7), + (True, 7, 7), # bool must be ignored, not interpreted as 1 + (3.7, 7, 3), + ], +) +def test_coerce_int_handles_crewai_quirks(value, default, expected) -> None: + assert _coerce_int(value, default) == expected diff --git a/tests/test_grep_backend.py b/tests/test_grep_backend.py new file mode 100644 index 0000000..67f39f3 --- /dev/null +++ b/tests/test_grep_backend.py @@ -0,0 +1,196 @@ +"""Tests for the Grep backend (Batch B2). + +Pin both: + +* The in-memory ``grep`` (used by the GitHub-backed agent tool — files + arrive as a dict because there's no local checkout). +* The ``grep_local`` ripgrep / Python-walk fallback (used by local-git + + folder modes). The fallback is tested without ripgrep installed + by walking the temp directory directly; the ripgrep happy-path is + only exercised when ``rg`` is available on $PATH. +""" +from __future__ import annotations + +import shutil +from pathlib import Path + +import pytest + +from gitpilot.agent_tools import _glob_to_regex +from gitpilot.grep_backend import ( + GREP_DEFAULT_MAX_RESULTS, + GREP_HARD_MAX_RESULTS, + GrepResult, + format_result, + grep, + grep_local, +) + + +# ---------------------------------------------------------------------- +# In-memory backend (GitHub mode) +# ---------------------------------------------------------------------- + +FILES = { + "README.md": "GitPilot README\nSee CONTRIBUTING.md\n", + "src/main.py": ( + "import asyncio\n" + "def main():\n" + " print('hello')\n" + "\n" + "if __name__ == '__main__':\n" + " main()\n" + ), + "src/util/io.py": ( + "from typing import Any\n" + "def read(path: str) -> str:\n" + " return ''\n" + ), + "tests/test_main.py": ( + "from src.main import main\n" + "def test_main():\n" + " main()\n" + ), +} + + +def test_grep_finds_basic_token() -> None: + r = grep(FILES, "asyncio") + assert not r.error + assert r.backend == "python" + assert len(r.hits) == 1 + assert r.hits[0].path == "src/main.py" + assert r.hits[0].line == 1 + + +def test_grep_returns_multiple_hits_sorted() -> None: + r = grep(FILES, "def ") + # def main (main.py), def read (io.py), def test_main (test_main.py) + assert [h.path for h in r.hits] == [ + "src/main.py", + "src/util/io.py", + "tests/test_main.py", + ] + + +def test_grep_case_insensitive_flag() -> None: + assert grep(FILES, "ASYNC").hits == [] + r = grep(FILES, "ASYNC", case_insensitive=True) + assert len(r.hits) == 1 + assert r.hits[0].path == "src/main.py" + + +def test_grep_invalid_regex_returns_error() -> None: + r = grep(FILES, "[unterminated") + assert r.error and "invalid regex" in r.error + assert r.hits == [] + + +def test_grep_path_filter_via_glob() -> None: + rx = _glob_to_regex("**/*.py") + r = grep(FILES, "import", path_filter=rx) + assert all(h.path.endswith(".py") for h in r.hits) + # README.md mentions CONTRIBUTING.md, not import — but the filter + # would have excluded it anyway. Belt-and-braces. + + +def test_grep_truncates_above_cap() -> None: + big = {f"f{i}.txt": "needle\n" * 50 for i in range(20)} + r = grep(big, "needle", max_results=10) + assert r.truncated is True + assert len(r.hits) == 10 + + +def test_grep_respects_hard_cap_even_when_caller_asks_more() -> None: + big = {f"f{i}.txt": "needle\n" * 10 for i in range(GREP_HARD_MAX_RESULTS + 50)} + r = grep(big, "needle", max_results=10_000) + # capped at GREP_HARD_MAX_RESULTS, regardless of the caller value. + assert len(r.hits) == GREP_HARD_MAX_RESULTS + assert r.truncated is True + + +def test_grep_empty_pattern_match_is_handled_gracefully() -> None: + """A pattern that matches every line should still respect the cap + rather than spinning on a multi-MB file.""" + big = {"big.txt": "x\n" * 10_000} + r = grep(big, ".", max_results=5) + assert len(r.hits) == 5 + assert r.truncated is True + + +def test_grep_skips_empty_files_quietly() -> None: + r = grep({"empty.txt": "", "main.py": "import os\n"}, "import") + assert [h.path for h in r.hits] == ["main.py"] + + +# ---------------------------------------------------------------------- +# Local backend (folder / local-git mode) +# ---------------------------------------------------------------------- + +def _seed_local_repo(tmp_path: Path) -> Path: + (tmp_path / "README.md").write_text("docs\n") + (tmp_path / "src").mkdir() + (tmp_path / "src" / "main.py").write_text("import asyncio\nprint('hi')\n") + (tmp_path / "src" / "util.py").write_text("def helper():\n return 1\n") + (tmp_path / "tests").mkdir() + (tmp_path / "tests" / "test_main.py").write_text("def test_x():\n assert True\n") + return tmp_path + + +def test_grep_local_python_fallback_finds_matches( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch +) -> None: + _seed_local_repo(tmp_path) + # Force the Python walk regardless of ripgrep availability. + monkeypatch.setattr("gitpilot.grep_backend.shutil.which", lambda _: None) + r = grep_local(str(tmp_path), "import") + assert r.backend == "python" + assert any(h.path == "src/main.py" and h.line == 1 for h in r.hits) + + +def test_grep_local_respects_glob_filter( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch +) -> None: + _seed_local_repo(tmp_path) + monkeypatch.setattr("gitpilot.grep_backend.shutil.which", lambda _: None) + r = grep_local(str(tmp_path), "def ", glob_filter="src/**/*.py") + paths = {h.path for h in r.hits} + assert paths == {"src/util.py"} + + +@pytest.mark.skipif(shutil.which("rg") is None, reason="ripgrep not installed") +def test_grep_local_uses_ripgrep_when_available(tmp_path: Path) -> None: + _seed_local_repo(tmp_path) + r = grep_local(str(tmp_path), "import") + assert r.backend == "ripgrep" + assert any(h.path == "src/main.py" for h in r.hits) + + +# ---------------------------------------------------------------------- +# Formatter +# ---------------------------------------------------------------------- + +def test_format_result_no_hits() -> None: + out = format_result(GrepResult(hits=[]), pattern="xyz") + assert "No matches" in out + + +def test_format_result_with_hits_and_truncation() -> None: + r = grep(FILES, "def ") + out = format_result(r, pattern="def ") + assert "def main" in out + assert "src/main.py" in out + + +def test_format_result_emits_truncation_hint() -> None: + big = {f"f{i}.txt": "needle\n" for i in range(10)} + r = grep(big, "needle", max_results=3) + out = format_result(r, pattern="needle") + assert "truncated" in out.lower() + + +def test_grep_default_cap_is_reasonable() -> None: + """The default cap must be small enough to fit in any model's + context, big enough to be useful on realistic repos.""" + assert 50 <= GREP_DEFAULT_MAX_RESULTS <= 200 + assert GREP_HARD_MAX_RESULTS >= GREP_DEFAULT_MAX_RESULTS diff --git a/tests/test_query_router.py b/tests/test_query_router.py new file mode 100644 index 0000000..50fa0c3 --- /dev/null +++ b/tests/test_query_router.py @@ -0,0 +1,290 @@ +"""Tests for the deterministic query router (Batch B9). + +The router is the auto-strategy layer that picks tools / triggers +RAG before the LLM even runs. Tests pin: + +* Intent classification on representative dev prompts (fix / find / + info / create / delete / modify / unknown). +* Target-file extraction: quoted, bareword, path-with-slash, with + / without repo verification. +* Fuzzy-query detection (natural language vs. symbol-shaped). +* RAG / auto-index decisions: only fired for read-leaning intents + on big enough repos. +* File-type policy: ``.py`` ⇒ surgical, ``.lock`` ⇒ reject. +* ``force_no_rag`` overrides the RAG recommendation. +* Hint rendering survives every combination without crashing. +""" +from __future__ import annotations + +import pytest + +from gitpilot.query_router import ( + DEFAULT_FUZZY_REPO_SIZE_FOR_RAG, + RouterDecision, + classify, + render_planner_hint, +) + + +# A medium-size synthetic repo (60 files) — above the +# RAG-recommendation threshold. +REPO = ( + ["README.md", "pyproject.toml", "src/main.py", "src/auth.py", + "src/util.py", "tests/test_main.py", "Dockerfile", + "poetry.lock", "package-lock.json"] + + [f"src/mod_{i:02d}.py" for i in range(60)] +) + + +# ---------------------------------------------------------------------- +# Intent classification +# ---------------------------------------------------------------------- + +@pytest.mark.parametrize( + "goal, expected", + [ + ("Fix the TypeError in src/auth.py", "fix"), + ("fix bug in main()", "fix"), + ("the app crashes on startup", "fix"), + ("a regression happened after the last commit", "fix"), + + ("find calls to authenticate_user", "find"), + ("where do we handle login?", "find"), + ("search for the cache layer", "find"), + + ("what is this project about", "info"), + ("explain how does the planner work", "info"), + ("describe the agent pipeline", "info"), + + ("create a new helper for date formatting", "create"), + ("add a CLI command for export", "create"), + ("generate a Dockerfile for production", "create"), + + ("delete all the .lock files", "delete"), + ("remove the unused vendor folder", "delete"), + + ("modify pyproject.toml dependencies", "modify"), + ("update README.md installation steps", "modify"), + ("rename foo_bar to baz_qux", "modify"), + ("refactor the auth module", "modify"), + + ("plonk into snargs", "unknown"), # nonsense + ("", "unknown"), # empty + ], +) +def test_intent_classification(goal: str, expected: str) -> None: + d = classify(goal, repo_files=REPO) + assert d.intent == expected, f"goal={goal!r}" + + +# ---------------------------------------------------------------------- +# Target-file extraction +# ---------------------------------------------------------------------- + +def test_extracts_path_with_slash() -> None: + d = classify("Fix bug in src/auth.py", repo_files=REPO) + assert d.target_files == ["src/auth.py"] + + +def test_extracts_bareword_file_with_extension() -> None: + d = classify("Modify pyproject.toml", repo_files=REPO) + assert "pyproject.toml" in d.target_files + + +def test_extracts_quoted_path() -> None: + d = classify("Fix bug in `src/util.py`", repo_files=REPO) + assert "src/util.py" in d.target_files + + +def test_does_not_invent_files_absent_from_repo() -> None: + """A user typo like ``src/aut.py`` must not survive verification.""" + d = classify("Fix bug in src/aut.py", repo_files=REPO) + assert d.target_files == [] + + +def test_no_repo_files_keeps_raw_candidates() -> None: + """When the caller can't supply the repo file list (offline / first + request) we surface the raw candidates so the planner can still + use them as hints.""" + d = classify("Fix bug in src/auth.py", repo_files=None) + assert "src/auth.py" in d.target_files + + +# ---------------------------------------------------------------------- +# Fuzzy detection + RAG recommendation +# ---------------------------------------------------------------------- + +def test_fuzzy_query_with_big_repo_and_no_index_triggers_auto_index() -> None: + d = classify( + "where do we handle session tokens and auth refresh", + repo_files=REPO, + rag_index_exists=False, + ) + assert d.intent == "find" + assert d.rag_recommended is True + assert d.auto_index_repo is True + + +def test_fuzzy_query_with_existing_index_recommends_rag_but_not_rebuild() -> None: + d = classify( + "where do we handle session tokens and auth refresh", + repo_files=REPO, + rag_index_exists=True, + ) + assert d.rag_recommended is True + assert d.auto_index_repo is False + + +def test_force_no_rag_suppresses_rag_recommendation() -> None: + """The post-Reject retry path sets force_no_rag=True; router + must respect it.""" + d = classify( + "where do we handle session tokens", + repo_files=REPO, + force_no_rag=True, + ) + assert d.rag_recommended is False + assert d.auto_index_repo is False + + +def test_symbol_query_uses_grep_not_rag() -> None: + """A query that names a real symbol (camelCase / snake_case / + SCREAMING) is exact-search territory, not embeddings.""" + d = classify("find calls to authenticate_user", repo_files=REPO) + assert d.rag_recommended is False + assert "Search file contents" in d.tool_priority + + +def test_small_repo_skips_auto_index() -> None: + small = ["README.md", "main.py"] + d = classify( + "where do we handle authentication tokens", + repo_files=small, + ) + assert d.repo_too_small_for_rag is True + assert d.auto_index_repo is False + + +def test_info_intent_does_not_auto_index() -> None: + """'What is this project' is answered from the repo map; no need + to spin up the embedding pipeline.""" + d = classify("what does this project do", repo_files=REPO) + assert d.intent == "info" + assert d.auto_index_repo is False + + +def test_delete_intent_does_not_auto_index() -> None: + d = classify("delete all the unused tests", repo_files=REPO) + assert d.intent == "delete" + assert d.auto_index_repo is False + + +# ---------------------------------------------------------------------- +# File-type policy +# ---------------------------------------------------------------------- + +def test_python_target_marks_surgical_with_indentation_hint() -> None: + d = classify("Fix bug in src/auth.py", repo_files=REPO) + assert d.edit_strategy == "surgical" + assert "indentation-sensitive" in d.file_policy_notes + + +def test_lockfile_target_marks_reject() -> None: + """Generated lock files are off-limits — edit the manifest.""" + d = classify("Modify poetry.lock", repo_files=REPO) + assert d.edit_strategy == "reject" + assert "lock" in d.file_policy_notes.lower() + + +def test_markdown_target_marks_surgical() -> None: + d = classify("Update README.md installation steps", repo_files=REPO) + assert d.edit_strategy == "surgical" + + +# ---------------------------------------------------------------------- +# Tool priority +# ---------------------------------------------------------------------- + +def test_fix_with_target_recommends_read_then_edit() -> None: + d = classify("Fix bug in src/auth.py", repo_files=REPO) + assert d.tool_priority[:2] == [ + "Read file content", + "Edit a section of a file", + ] + + +def test_fix_without_target_recommends_grep_first() -> None: + """No specific file mentioned and not fuzzy → grep first.""" + d = classify("fix authenticate_user bug", repo_files=REPO) + assert d.tool_priority[0] in ( + "Search file contents", "Find code by semantic search", + ) + + +def test_fuzzy_fix_without_target_prefers_semantic_search() -> None: + d = classify( + "fix the issue where login tokens expire too quickly", + repo_files=REPO, + ) + # Fuzzy + no target → RAG first. + assert d.tool_priority[0] == "Find code by semantic search" + + +def test_delete_intent_uses_glob_first() -> None: + d = classify("delete all the unused tests", repo_files=REPO) + assert d.tool_priority[0] == "Find files matching a pattern" + + +def test_info_intent_recommends_read_only() -> None: + d = classify("what does this project do", repo_files=REPO) + # No write tools — info queries are read-only. + write_tools = {"Edit a section of a file", "Write or update a file in the repository"} + assert not (set(d.tool_priority) & write_tools) + + +# ---------------------------------------------------------------------- +# Hint rendering +# ---------------------------------------------------------------------- + +def test_render_planner_hint_contains_intent_and_files() -> None: + d = classify("Fix bug in src/auth.py", repo_files=REPO) + hint = render_planner_hint(d) + assert "Intent" in hint and "fix" in hint + assert "src/auth.py" in hint + + +def test_render_planner_hint_mentions_index_step_when_auto_index() -> None: + d = classify( + "where do we handle session tokens", + repo_files=REPO, + rag_index_exists=False, + ) + hint = render_planner_hint(d) + assert "INDEX" in hint # the planner must know to include the step + assert "one-time" in hint + + +def test_render_planner_hint_skips_index_step_when_consent_exists() -> None: + d = classify( + "where do we handle session tokens", + repo_files=REPO, + rag_index_exists=True, + ) + hint = render_planner_hint(d) + assert "INDEX" not in hint + + +def test_router_decision_is_frozen() -> None: + d = classify("info", repo_files=REPO) + with pytest.raises(Exception): + d.intent = "fix" # type: ignore[misc] + + +# ---------------------------------------------------------------------- +# DEFAULT_FUZZY_REPO_SIZE_FOR_RAG sanity +# ---------------------------------------------------------------------- + +def test_threshold_is_sensible() -> None: + """The RAG threshold must be high enough to skip toy repos and + low enough that mid-size projects benefit.""" + assert 10 <= DEFAULT_FUZZY_REPO_SIZE_FOR_RAG <= 200 diff --git a/tests/test_rag_consent.py b/tests/test_rag_consent.py new file mode 100644 index 0000000..e2eb9b2 --- /dev/null +++ b/tests/test_rag_consent.py @@ -0,0 +1,146 @@ +"""Tests for per-repo RAG consent (Batch B9). + +Pin every contract the router and executor rely on: + +* Initial state: no consent. +* Grant writes a readable, well-formed JSON file with a UTC + timestamp. Idempotent. +* Revoke deletes the consent file *and* wipes every per-branch + index directory under the same repo. Returns ``True`` only when + something was actually removed; subsequent revokes return False. +* Malformed consent files count as "no consent" (fail closed). +* Paths with unsafe characters are sanitised — owner/repo of the + form ``"my org/weird repo"`` cannot escape the consent root. +""" +from __future__ import annotations + +import json +from pathlib import Path + +import pytest + +from gitpilot.rag_consent import ( + CONSENT_FILE, + ConsentRecord, + grant_consent, + has_consent, + load_record, + revoke_consent, +) + + +@pytest.fixture(autouse=True) +def _isolated_rag_root(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path: + monkeypatch.setenv("GITPILOT_RAG_ROOT", str(tmp_path)) + return tmp_path + + +# ---------------------------------------------------------------------- +# Round-trip +# ---------------------------------------------------------------------- + +def test_initial_state_is_no_consent() -> None: + assert has_consent("o", "r") is False + assert load_record("o", "r") is None + + +def test_grant_creates_well_formed_file(tmp_path: Path) -> None: + rec = grant_consent("o", "r", granted_by="alice") + assert isinstance(rec, ConsentRecord) + assert rec.granted_by == "alice" + assert has_consent("o", "r") is True + # Verify the on-disk shape so the router can read it from a + # different process. + cpath = tmp_path / "o" / "r" / CONSENT_FILE + raw = json.loads(cpath.read_text(encoding="utf-8")) + assert "granted_at" in raw + assert raw["granted_by"] == "alice" + + +def test_grant_is_idempotent() -> None: + grant_consent("o", "r") + grant_consent("o", "r", granted_by="other") + # Second call must not error; the record reflects the latest grant. + rec = load_record("o", "r") + assert rec is not None + assert rec.granted_by == "other" + + +def test_grant_without_granted_by_works() -> None: + rec = grant_consent("o", "r") + assert rec.granted_by is None + + +def test_revoke_returns_true_when_something_existed() -> None: + grant_consent("o", "r") + assert revoke_consent("o", "r") is True + assert has_consent("o", "r") is False + + +def test_revoke_returns_false_when_nothing_to_revoke() -> None: + assert revoke_consent("o", "r") is False + + +def test_revoke_wipes_per_branch_index_directories(tmp_path: Path) -> None: + """Revoke must remove every / directory under the consent + root — that's the index data we shouldn't keep once consent is + withdrawn.""" + grant_consent("o", "r") + # Simulate a built index on two branches. + (tmp_path / "o" / "r" / "main").mkdir(parents=True) + (tmp_path / "o" / "r" / "feature_x").mkdir(parents=True) + (tmp_path / "o" / "r" / "main" / "data.bin").write_text("x") + (tmp_path / "o" / "r" / "feature_x" / "data.bin").write_text("y") + + assert revoke_consent("o", "r") is True + assert not (tmp_path / "o" / "r" / "main").exists() + assert not (tmp_path / "o" / "r" / "feature_x").exists() + + +# ---------------------------------------------------------------------- +# Failure modes — fail closed +# ---------------------------------------------------------------------- + +def test_malformed_consent_file_counts_as_no_consent(tmp_path: Path) -> None: + cdir = tmp_path / "o" / "r" + cdir.mkdir(parents=True) + (cdir / CONSENT_FILE).write_text("not json at all", encoding="utf-8") + assert has_consent("o", "r") is False + assert load_record("o", "r") is None + + +def test_consent_file_with_wrong_shape_counts_as_no_consent( + tmp_path: Path, +) -> None: + cdir = tmp_path / "o" / "r" + cdir.mkdir(parents=True) + (cdir / CONSENT_FILE).write_text(json.dumps(["wrong", "shape"]), + encoding="utf-8") + assert has_consent("o", "r") is False + + +def test_missing_owner_or_repo_is_no_consent() -> None: + assert has_consent("", "r") is False + assert has_consent("o", "") is False + assert revoke_consent("", "r") is False + + +def test_grant_rejects_empty_args() -> None: + with pytest.raises(ValueError): + grant_consent("", "r") + + +# ---------------------------------------------------------------------- +# Path sanitisation — owner/repo with unsafe characters +# ---------------------------------------------------------------------- + +def test_unsafe_owner_chars_sanitised(tmp_path: Path) -> None: + """A malicious-looking owner string must not escape the + consent root via ``..`` or absolute paths.""" + grant_consent("../etc", "r") + # The consent file is somewhere under tmp_path, not outside it. + matches = list(tmp_path.rglob(CONSENT_FILE)) + assert matches, "consent file not created under the sanitised root" + for m in matches: + # str(m) starts with tmp_path string — never above it. + assert str(m).startswith(str(tmp_path)) diff --git a/tests/test_rag_pipeline.py b/tests/test_rag_pipeline.py new file mode 100644 index 0000000..a15bd70 --- /dev/null +++ b/tests/test_rag_pipeline.py @@ -0,0 +1,374 @@ +"""Tests for the local RAG pipeline (Batch B7). + +All tests use the dependency-free ``HashingEmbedder`` so the suite +doesn't need to download the 80 MB MiniLM ONNX model. The +production path (DefaultEmbedder via ChromaDB's bundled MiniLM) is +exercised by a single smoke test that's skipped when the binary +isn't available. + +Coverage: + +* Chunker: line-window with overlap, binary skip, oversize skip, + determinism, empty/whitespace edge cases. +* HashingEmbedder: deterministic, dimension respected, similar text + ranks higher than unrelated text. +* RagStore: round-trip add → count → query → delete_by_path; the + ChromaDB persistence path survives a fresh process restart. +* Indexer: incremental build (unchanged files skipped), embedder- + change triggers full rebuild, deleted files drop from the index. +* Retriever: top-k respected, MMR diversifies same-file hits, + empty index returns [] not error. +""" +from __future__ import annotations + +import hashlib +from pathlib import Path + +import pytest + +from gitpilot.rag import ( + HashingEmbedder, + IndexMeta, + build_index_from_files, + chunk_file, + retrieve_top_k, +) +from gitpilot.rag.chunker import ( + CHUNK_LINES, + CHUNK_OVERLAP, + MAX_FILE_BYTES, + Chunk, +) +from gitpilot.rag.embedder import cosine_similarity +from gitpilot.rag.indexer import build_index_from_files as build_idx +from gitpilot.rag.store import RagStore, _collection_name, _sanitize + + +# ---------------------------------------------------------------------- +# Chunker +# ---------------------------------------------------------------------- + +def test_chunk_file_basic_window_with_overlap() -> None: + content = "\n".join(f"line {i}" for i in range(1, 101)) # 100 lines + chunks = chunk_file("a.py", content) + # 100 lines, 40-line windows, 5-line overlap, step=35: windows + # start at 1, 36, 71 (last one extends to line 100). + assert len(chunks) == 3 + starts = [c.start_line for c in chunks] + assert starts == [1, 36, 71] + ends = [c.end_line for c in chunks] + assert ends == [40, 75, 100] + # IDs are deterministic and prefixed by a hash of the path. + for c in chunks: + assert ":" in c.chunk_id + + +def test_chunk_file_skips_binary_content() -> None: + binary = "PNG\x00\x00\x00\x00" + ("A" * 1000) + assert chunk_file("img.png", binary) == [] + + +def test_chunk_file_skips_oversize_content() -> None: + huge = "x" * (MAX_FILE_BYTES + 1) + assert chunk_file("huge.txt", huge) == [] + + +def test_chunk_file_empty_input_returns_empty() -> None: + assert chunk_file("empty.py", "") == [] + assert chunk_file("ws.py", " \n \n") != [] # whitespace lines count + + +def test_chunk_file_is_deterministic() -> None: + c = "import os\nimport sys\n" * 30 + a = chunk_file("x.py", c) + b = chunk_file("x.py", c) + assert [ch.chunk_id for ch in a] == [ch.chunk_id for ch in b] + assert [ch.start_line for ch in a] == [ch.start_line for ch in b] + + +def test_chunk_file_respects_default_constants() -> None: + assert CHUNK_LINES == 40 + assert CHUNK_OVERLAP == 5 + + +# ---------------------------------------------------------------------- +# HashingEmbedder +# ---------------------------------------------------------------------- + +def test_hashing_embedder_dimension() -> None: + e = HashingEmbedder() + vecs = e(["hello world", "another sentence"]) + assert len(vecs) == 2 + assert len(vecs[0]) == e.dim + + +def test_hashing_embedder_deterministic() -> None: + e = HashingEmbedder() + a = e(["the quick brown fox"]) + b = e(["the quick brown fox"]) + assert a == b + + +def test_hashing_embedder_similar_text_ranks_above_unrelated() -> None: + e = HashingEmbedder() + query = "authentication middleware" + related = "user authentication middleware in flask" + unrelated = "matplotlib pyplot bar chart" + qv, rv, uv = e([query, related, unrelated]) + sim_related = cosine_similarity(qv, rv) + sim_unrelated = cosine_similarity(qv, uv) + assert sim_related > sim_unrelated + + +# ---------------------------------------------------------------------- +# Store names / sanitisation +# ---------------------------------------------------------------------- + +def test_collection_name_is_chromadb_safe() -> None: + name = _collection_name("My Org", "weird/repo", "feature/ai-stuff") + # Alphanumeric / underscore / hyphen only. Starts and ends + # alphanumeric. Within length bounds. + assert 3 <= len(name) <= 512 + assert name[0].isalnum() and name[-1].isalnum() + import re + assert re.fullmatch(r"[A-Za-z0-9_-]+", name) + + +def test_sanitize_strips_unsafe_chars() -> None: + assert _sanitize("foo bar/baz") == "foo_bar_baz" + + +# ---------------------------------------------------------------------- +# Store round-trip +# ---------------------------------------------------------------------- + +def test_store_round_trip_add_count_query(tmp_path: Path) -> None: + chunks = [ + Chunk( + chunk_id="A:1", path="src/auth.py", + start_line=1, end_line=20, + text="def authenticate(user, password):\n return check_credentials(user, password)\n", + file_sha="aaa", + ), + Chunk( + chunk_id="B:1", path="src/util.py", + start_line=1, end_line=10, + text="def helper(x):\n return x * 2\n", + file_sha="bbb", + ), + ] + s = RagStore( + owner="o", repo="r", branch="main", + embedder=HashingEmbedder(), + persist_dir=tmp_path, + ) + n = s.add_chunks(chunks) + assert n == 2 + assert s.count() == 2 + + hits = s.query("authentication password", k=2) + assert hits, "query returned no hits" + # Best hit should be the auth chunk. + assert hits[0].path == "src/auth.py" + + +def test_store_delete_by_path_removes_only_that_file(tmp_path: Path) -> None: + s = RagStore( + owner="o", repo="r", branch="main", + embedder=HashingEmbedder(), + persist_dir=tmp_path, + ) + chunks = [ + Chunk( + chunk_id="A:1", path="a.py", start_line=1, end_line=5, + text="content of a", file_sha="aaa", + ), + Chunk( + chunk_id="B:1", path="b.py", start_line=1, end_line=5, + text="content of b", file_sha="bbb", + ), + ] + s.add_chunks(chunks) + assert s.count() == 2 + removed = s.delete_by_path("a.py") + assert removed == 1 + assert s.count() == 1 + + +def test_store_query_empty_index_returns_empty(tmp_path: Path) -> None: + s = RagStore( + owner="o", repo="r", branch="main", + embedder=HashingEmbedder(), + persist_dir=tmp_path, + ) + assert s.query("anything", k=5) == [] + + +# ---------------------------------------------------------------------- +# Indexer +# ---------------------------------------------------------------------- + +def test_indexer_first_run_indexes_all_files(tmp_path: Path) -> None: + files = [ + ("src/auth.py", "def authenticate(user, password):\n return True\n"), + ("README.md", "# Project\nThis is about astronomy and stars.\n"), + ] + report = build_idx( + files, owner="o", repo="r", branch="main", + embedder=HashingEmbedder(), + persist_dir=tmp_path, + ) + assert report.files_seen == 2 + assert report.files_indexed == 2 + assert report.files_skipped == 0 + assert report.chunks_added >= 2 + + meta = IndexMeta.load(tmp_path) + assert meta is not None + assert set(meta.indexed_files.keys()) == {"src/auth.py", "README.md"} + + +def test_indexer_skips_unchanged_files_on_second_run(tmp_path: Path) -> None: + files = [ + ("src/auth.py", "def authenticate(): pass\n"), + ("README.md", "# Project\n"), + ] + emb = HashingEmbedder() + build_idx(files, owner="o", repo="r", branch="main", embedder=emb, persist_dir=tmp_path) + + # Re-run with the same content. + report = build_idx( + files, owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + assert report.files_seen == 2 + assert report.files_indexed == 0 + assert report.files_skipped == 2 + + +def test_indexer_reindexes_changed_file(tmp_path: Path) -> None: + emb = HashingEmbedder() + build_idx( + [("src/x.py", "old content\n")], + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + report = build_idx( + [("src/x.py", "new content with totally different words\n")], + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + assert report.files_indexed == 1 + assert report.files_skipped == 0 + + +def test_indexer_embedder_change_triggers_full_rebuild(tmp_path: Path) -> None: + """If the embedder name changes, all old vectors are incomparable + and we must rebuild from scratch.""" + emb_a = HashingEmbedder() + emb_a.name = "fake-A" # type: ignore[attr-defined] + build_idx( + [("src/x.py", "hello\n")], + owner="o", repo="r", branch="main", + embedder=emb_a, persist_dir=tmp_path, + ) + emb_b = HashingEmbedder() + emb_b.name = "fake-B" # type: ignore[attr-defined] + report = build_idx( + [("src/x.py", "hello\n")], + owner="o", repo="r", branch="main", + embedder=emb_b, persist_dir=tmp_path, + ) + # Forced full rebuild → file indexed again even though sha matches. + assert report.files_indexed == 1 + assert report.files_skipped == 0 + + +# ---------------------------------------------------------------------- +# Retriever +# ---------------------------------------------------------------------- + +def test_retrieve_top_k_finds_relevant_chunk(tmp_path: Path) -> None: + emb = HashingEmbedder() + build_idx( + [ + ("README.md", "# Nuclear shell model\nA tool for nuclear physics.\n"), + ("src/util.py", "def helper(x):\n return x * 2\n"), + ], + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + hits = retrieve_top_k( + "nuclear physics shell", + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + k=2, mmr=False, + ) + assert hits + assert hits[0].path == "README.md" + + +def test_retrieve_top_k_respects_k(tmp_path: Path) -> None: + emb = HashingEmbedder() + files = [ + (f"src/f_{i}.py", f"def f_{i}(): return {i}\n") + for i in range(10) + ] + build_idx( + files, owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + hits = retrieve_top_k( + "function definition", + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + k=3, mmr=False, + ) + assert len(hits) == 3 + + +def test_retrieve_top_k_empty_query_returns_empty(tmp_path: Path) -> None: + emb = HashingEmbedder() + build_idx( + [("a.py", "def foo(): pass\n")], + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + assert retrieve_top_k( + "", + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) == [] + + +def test_retrieve_top_k_missing_index_returns_empty(tmp_path: Path) -> None: + """When the persist dir doesn't exist yet the retriever must + silently return [] — agents fall back to grep, not crash.""" + assert retrieve_top_k( + "anything", + owner="o", repo="r", branch="main", + embedder=HashingEmbedder(), + persist_dir=tmp_path / "never-built", + ) == [] + + +# ---------------------------------------------------------------------- +# Determinism across process restarts (ChromaDB persistence) +# ---------------------------------------------------------------------- + +def test_index_persists_across_store_recreations(tmp_path: Path) -> None: + emb = HashingEmbedder() + build_idx( + [("src/main.py", "import asyncio\ndef main():\n pass\n")], + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + # Construct a fresh store object pointing at the same dir. + s = RagStore( + owner="o", repo="r", branch="main", + embedder=emb, persist_dir=tmp_path, + ) + assert s.count() >= 1 + hits = s.query("import asyncio", k=1) + assert hits + assert hits[0].path == "src/main.py" diff --git a/tests/test_repo_map.py b/tests/test_repo_map.py new file mode 100644 index 0000000..425d704 --- /dev/null +++ b/tests/test_repo_map.py @@ -0,0 +1,249 @@ +"""Tests for the auto-generated repo map (Batch B6). + +Pin the contract the planner relies on: + +* Pure function: ``build_repo_map`` is deterministic and side-effect + free. Same input → identical output, byte-for-byte. +* Key-files heuristic surfaces the well-known anchors (README.md, + pyproject.toml, etc.) when they exist. +* Modules are sorted by file count descending so the planner sees + the most-populated dirs first. +* The rendered AGENTS.md blob honours a token budget — even on a + 10 000-file synthetic repo the output stays bounded. +* Round-trip through the on-disk cache reconstructs the same object. +""" +from __future__ import annotations + +import json +import os +from pathlib import Path + +import pytest + +from gitpilot import flags +from gitpilot.context_budget import estimate_tokens +from gitpilot.repo_map import ( + DEFAULT_MAP_TOKEN_BUDGET, + FLAG_REPO_MAP, + MAP_MAX_MODULES, + MAPS_DIR_ENV, + RepoMap, + build_repo_map, + get_or_build_repo_map, + load_cached, + save_cached, +) + + +# ---------------------------------------------------------------------- +# Sample inputs +# ---------------------------------------------------------------------- + +SMALL = [ + "README.md", + "pyproject.toml", + "src/main.py", + "src/util/io.py", + "src/util/__init__.py", + "tests/test_main.py", + "docs/intro.md", + "Dockerfile", + "LICENSE", +] + + +def _large_paths(n: int) -> list[str]: + paths = ["README.md", "pyproject.toml", "LICENSE"] + for i in range(n): + mod = f"src/mod_{i % 20:02d}" + paths.append(f"{mod}/file_{i:04d}.py") + return paths + + +# ---------------------------------------------------------------------- +# Determinism / facts +# ---------------------------------------------------------------------- + +def test_build_is_deterministic() -> None: + a = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + b = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + # generated_at differs (it's the timestamp); every other field + # must match byte-for-byte. + a_dict, b_dict = a.to_dict(), b.to_dict() + a_dict.pop("generated_at", None) + b_dict.pop("generated_at", None) + assert a_dict == b_dict + + +def test_well_known_files_promoted_to_key() -> None: + m = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + assert "README.md" in m.key_files + assert "pyproject.toml" in m.key_files + assert "Dockerfile" in m.key_files + assert "LICENSE" in m.key_files + + +def test_modules_sorted_by_file_count_desc() -> None: + m = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + counts = [mod.file_count for mod in m.modules] + assert counts == sorted(counts, reverse=True) + + +def test_root_files_appear_as_root_pseudo_module() -> None: + m = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + root_modules = [mod for mod in m.modules if mod.path == "(root)"] + assert len(root_modules) == 1 + # README, pyproject, Dockerfile, LICENSE are all root-level. + assert root_modules[0].file_count >= 4 + + +def test_languages_histogram_counts_extensions() -> None: + m = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + assert m.languages.get("py", 0) >= 3 + assert m.languages.get("md", 0) >= 1 + assert m.languages.get("toml", 0) == 1 + + +def test_total_files_dedupes() -> None: + """Duplicate paths must not double-count.""" + m = build_repo_map( + owner="o", repo="r", branch="main", + paths=SMALL + SMALL, + ) + assert m.total_files == len(set(SMALL)) + + +# ---------------------------------------------------------------------- +# Budget / scaling +# ---------------------------------------------------------------------- + +def test_agents_md_under_budget_on_large_repo() -> None: + m = build_repo_map( + owner="o", repo="r", branch="main", + paths=_large_paths(2000), + token_budget=500, + ) + assert estimate_tokens(m.agents_md) <= 700 # generous slack + assert "Total files" in m.agents_md + assert "Modules" in m.agents_md + + +def test_modules_capped_at_hard_max() -> None: + """We never want the map to grow unbounded with module count.""" + paths = [f"mod_{i}/file.py" for i in range(MAP_MAX_MODULES * 3)] + m = build_repo_map(owner="o", repo="r", branch="main", paths=paths) + assert len(m.modules) <= MAP_MAX_MODULES + + +def test_agents_md_mentions_tools_for_drill_down() -> None: + """The map's footer must point the agent to the B1/B2 tools so it + knows what to do when it needs more detail than the map shows.""" + m = build_repo_map(owner="o", repo="r", branch="main", paths=SMALL) + assert "Find files matching a pattern" in m.agents_md + assert "Search file contents" in m.agents_md + assert "Read file content" in m.agents_md + + +# ---------------------------------------------------------------------- +# Persistence +# ---------------------------------------------------------------------- + +def test_round_trip_through_cache(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + monkeypatch.setenv(MAPS_DIR_ENV, str(tmp_path)) + a = build_repo_map( + owner="o", repo="r", branch="main", + paths=SMALL, commit_sha="deadbeef", + ) + save_cached(a) + b = load_cached("o", "r", "main") + assert b is not None + assert b.owner == a.owner + assert b.repo == a.repo + assert b.branch == a.branch + assert b.commit_sha == a.commit_sha + assert b.total_files == a.total_files + assert b.key_files == a.key_files + assert [m.path for m in b.modules] == [m.path for m in a.modules] + + +def test_load_cached_returns_none_for_missing(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + monkeypatch.setenv(MAPS_DIR_ENV, str(tmp_path)) + assert load_cached("nope", "nope", "main") is None + + +# ---------------------------------------------------------------------- +# get_or_build (cache-aware) +# ---------------------------------------------------------------------- + +def test_get_or_build_uses_cache_when_sha_matches( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setenv(MAPS_DIR_ENV, str(tmp_path)) + + calls = {"n": 0} + def _provider(): + calls["n"] += 1 + return SMALL + + a = get_or_build_repo_map( + owner="o", repo="r", branch="main", + paths_provider=_provider, commit_sha="abc123", + ) + b = get_or_build_repo_map( + owner="o", repo="r", branch="main", + paths_provider=_provider, commit_sha="abc123", + ) + assert a.commit_sha == b.commit_sha == "abc123" + # Provider should have been called only on the first build. + assert calls["n"] == 1 + + +def test_get_or_build_refreshes_on_new_sha( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setenv(MAPS_DIR_ENV, str(tmp_path)) + + calls = {"n": 0} + def _provider(): + calls["n"] += 1 + return SMALL + + get_or_build_repo_map( + owner="o", repo="r", branch="main", + paths_provider=_provider, commit_sha="aaa", + ) + get_or_build_repo_map( + owner="o", repo="r", branch="main", + paths_provider=_provider, commit_sha="bbb", + ) + assert calls["n"] == 2 + + +def test_force_rebuilds_even_with_same_sha( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setenv(MAPS_DIR_ENV, str(tmp_path)) + + calls = {"n": 0} + def _provider(): + calls["n"] += 1 + return SMALL + + get_or_build_repo_map( + owner="o", repo="r", branch="main", + paths_provider=_provider, commit_sha="aaa", + ) + get_or_build_repo_map( + owner="o", repo="r", branch="main", + paths_provider=_provider, commit_sha="aaa", + force=True, + ) + assert calls["n"] == 2 + + +# ---------------------------------------------------------------------- +# Sanity +# ---------------------------------------------------------------------- + +def test_default_budget_is_documented_value() -> None: + assert DEFAULT_MAP_TOKEN_BUDGET == 500 diff --git a/tests/test_sandbox_api.py b/tests/test_sandbox_api.py new file mode 100644 index 0000000..4dae2be --- /dev/null +++ b/tests/test_sandbox_api.py @@ -0,0 +1,228 @@ +"""Tests for the /api/sandbox/* HTTP surface. + +These tests pin down the persistence + transparency contract so a +future change can't silently regress the production behaviour: + +- backend persistence: PUT /config writes through, GET /status reads back +- env override: GITPILOT_SANDBOX leaks into /status.env_override +- token redaction: GET /settings never returns matrixlab_token +- snippet execution: POST /run with the default subprocess backend + produces stdout the agent can read +- language whitelist: unknown languages return 400 not 500 +- MatrixLab routing: backend=matrixlab POSTs to /code/run (verified + with httpx.MockTransport rather than a live runner) +- lifecycle gating: mutating endpoints 403 when the env flag is off + +These run without Docker, without Ollama, and without MatrixLab — they +exercise our code path only. +""" +from __future__ import annotations + +import json +import os +from typing import Iterator + +import httpx +import pytest +from fastapi.testclient import TestClient + +from gitpilot.settings import reload_settings + + +@pytest.fixture() +def client(tmp_path, monkeypatch) -> Iterator[TestClient]: + """Spin up the FastAPI app with an isolated config dir so the + tests don't read/write the real ~/.gitpilot/settings.json.""" + # CONFIG_DIR / CONFIG_FILE are module constants captured at import + # time from the env, so a plain setenv after import is too late. + # Patch the module attributes directly so reload_settings() reads + # from our tmp dir. + from gitpilot import settings as settings_module + + cfg_dir = tmp_path / "cfg" + cfg_dir.mkdir(parents=True, exist_ok=True) + monkeypatch.setattr(settings_module, "CONFIG_DIR", cfg_dir) + monkeypatch.setattr(settings_module, "CONFIG_FILE", cfg_dir / "settings.json") + monkeypatch.setenv("GITPILOT_CONFIG_DIR", str(cfg_dir)) + # Clear any env var that would override the persisted backend + # so each test starts from the on-disk default. + for name in ( + "GITPILOT_SANDBOX", + "GITPILOT_MATRIXLAB_URL", + "GITPILOT_MATRIXLAB_TOKEN", + "GITPILOT_MATRIXLAB_IMAGE", + "GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE", + ): + monkeypatch.delenv(name, raising=False) + reload_settings() + from gitpilot.api import app + + with TestClient(app) as c: + yield c + reload_settings() + + +def test_status_defaults_to_subprocess(client: TestClient) -> None: + r = client.get("/api/sandbox/status") + assert r.status_code == 200 + data = r.json() + assert data["backend"] == "subprocess" + assert data["ok"] is True + assert data["has_token"] is False + assert data["env_override"] is None + assert sorted(data["available_backends"]) == ["matrixlab", "off", "subprocess"] + + +def test_config_persists_across_calls(client: TestClient) -> None: + r = client.put( + "/api/sandbox/config", + json={"backend": "matrixlab", "matrixlab_url": "http://matrix.example:9000"}, + ) + assert r.status_code == 200 + data = r.json() + assert data["backend"] == "matrixlab" + assert data["matrixlab_url"] == "http://matrix.example:9000" + # And GET /status now reflects it. + r = client.get("/api/sandbox/status") + assert r.json()["backend"] == "matrixlab" + assert r.json()["matrixlab_url"] == "http://matrix.example:9000" + + +def test_config_rejects_unknown_backend(client: TestClient) -> None: + r = client.put("/api/sandbox/config", json={"backend": "docker"}) + assert r.status_code == 400 + assert "unknown sandbox backend" in r.json()["detail"] + + +def test_token_redacted_from_settings_response(client: TestClient) -> None: + """The secret matrixlab_token must never round-trip back to the + browser. GET /settings returns has_token instead.""" + client.put( + "/api/sandbox/config", + json={"backend": "matrixlab", "matrixlab_token": "very-secret"}, + ) + r = client.get("/api/settings") + sandbox_block = r.json()["sandbox"] + assert sandbox_block["has_token"] is True + assert "matrixlab_token" not in sandbox_block + + +def test_env_var_surfaces_as_override(client: TestClient, monkeypatch) -> None: + """Operators sometimes pin the backend with an env var on the host. + The UI must be able to render an "env override" badge so users + don't think their UI choice was silently lost.""" + monkeypatch.setenv("GITPILOT_SANDBOX", "subprocess") + reload_settings() # pick up the env after fixture cleanup + r = client.get("/api/sandbox/status") + assert r.json()["env_override"] == "GITPILOT_SANDBOX" + + +def test_run_python_via_subprocess_backend(client: TestClient) -> None: + """Default backend executes a snippet and surfaces stdout.""" + r = client.post( + "/api/sandbox/run", + json={"language": "python", "code": "print('it works'); print(7 + 5)"}, + ) + assert r.status_code == 200 + data = r.json() + assert data["backend"] == "subprocess" + assert data["exit_code"] == 0 + assert "it works" in data["stdout"] + assert "12" in data["stdout"] + + +def test_run_surfaces_python_traceback(client: TestClient) -> None: + """Error retrieval contract: stderr comes back verbatim, exit + code non-zero. This is what the agent's run_in_sandbox tool + reads to plan a fix.""" + r = client.post( + "/api/sandbox/run", + json={"language": "python", "code": "raise ValueError('boom')"}, + ) + assert r.status_code == 200 + data = r.json() + assert data["exit_code"] != 0 + assert "ValueError" in data["stderr"] + assert "boom" in data["stderr"] + + +def test_run_rejects_unknown_language(client: TestClient) -> None: + r = client.post( + "/api/sandbox/run", + json={"language": "ruby", "code": "puts 'hi'"}, + ) + assert r.status_code == 400 + detail = r.json()["detail"] + assert "ruby" in detail + + +def test_run_rejects_empty_code(client: TestClient) -> None: + r = client.post("/api/sandbox/run", json={"language": "python", "code": " "}) + assert r.status_code == 400 + + +def test_matrixlab_backend_calls_code_run(client: TestClient, monkeypatch) -> None: + """When the user picks matrixlab, GitPilot must POST to /code/run + (the snippet endpoint), not /repo/run (which expects a repo_url). + Use httpx.MockTransport so we don't need a real runner.""" + captured: dict = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + captured["body"] = json.loads(request.content) + return httpx.Response( + 200, + json={ + "sandbox_id": "test-uuid", + "exit_code": 0, + "stdout": "matrixlab response", + "stderr": "", + "duration_ms": 123, + }, + ) + + mock_transport = httpx.MockTransport(handler) + real_async_client = httpx.AsyncClient + + def _patched(*args, **kwargs): + return real_async_client(transport=mock_transport, **{k: v for k, v in kwargs.items() if k != "transport"}) + + monkeypatch.setattr(httpx, "AsyncClient", _patched) + + client.put("/api/sandbox/config", json={"backend": "matrixlab"}) + r = client.post( + "/api/sandbox/run", + json={"language": "py", "code": "print('hi')"}, + ) + assert r.status_code == 200, r.text + data = r.json() + assert data["backend"] == "matrixlab" + assert data["stdout"] == "matrixlab response" + assert data["sandbox_id"] == "test-uuid" + assert captured["url"].endswith("/code/run") + # Alias normalisation: "py" → "python" before hitting the runner. + assert captured["body"]["language"] == "python" + assert captured["body"]["code"] == "print('hi')" + + +def test_lifecycle_mutating_endpoints_gated(client: TestClient) -> None: + """Without GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE, install/start/stop + must 403 — never silently execute docker on behalf of a browser + POST.""" + for path in ("/api/sandbox/matrixlab/install", + "/api/sandbox/matrixlab/start", + "/api/sandbox/matrixlab/stop"): + r = client.post(path) + assert r.status_code == 403, f"{path} must require the env flag" + assert "GITPILOT_ENABLE_MATRIXLAB_LIFECYCLE" in r.json()["detail"] + + +def test_lifecycle_status_always_safe(client: TestClient) -> None: + """GET /lifecycle is read-only; safe to call even when the env + flag is off. Reports lifecycle_enabled=False so the UI can render + the right hint.""" + r = client.get("/api/sandbox/matrixlab/lifecycle") + assert r.status_code == 200 + data = r.json() + assert data["lifecycle_enabled"] is False + assert "instructions" in data diff --git a/tests/test_task_recorder.py b/tests/test_task_recorder.py new file mode 100644 index 0000000..199148b --- /dev/null +++ b/tests/test_task_recorder.py @@ -0,0 +1,291 @@ +"""Tests for the right-sidebar Tasks recorder. + +Pin the contract used by ``/api/chat/plan`` and ``/api/chat/execute`` +to record each top-level AI invocation as a Task on the active +session, and the read endpoint ``/api/sessions/{sid}/tasks`` that +surfaces them to the chat UI. +""" +from __future__ import annotations + +from typing import Iterator + +import pytest +from fastapi.testclient import TestClient + +from gitpilot import api as api_module +from gitpilot import flags +from gitpilot.session import Task +from gitpilot.task_recorder import ( + FLAG_TASKS_SIDEBAR, + begin_task, + finish_task, +) + + +@pytest.fixture() +def client() -> Iterator[TestClient]: + yield TestClient(api_module.app) + + +# ---------------------------------------------------------------------- +# Recorder primitives +# ---------------------------------------------------------------------- + +def test_begin_task_appends_running_entry() -> None: + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="rec-begin" + ) + task = begin_task(api_module._session_mgr, session.id, kind="plan", title="hello") + assert task is not None + assert task.status == "running" + assert task.title == "hello" + + reloaded = api_module._session_mgr.load(session.id) + assert len(reloaded.tasks) == 1 + assert reloaded.tasks[0].status == "running" + + +def test_finish_task_marks_completed_and_records_duration() -> None: + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="rec-finish" + ) + task = begin_task(api_module._session_mgr, session.id, kind="plan", title="t") + assert task is not None + finish_task( + api_module._session_mgr, + session.id, + task, + status="completed", + prompt_tokens=120, + completion_tokens=40, + ) + + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.tasks[0].status == "completed" + assert reloaded.tasks[0].completed_at is not None + assert reloaded.tasks[0].duration_ms is not None + assert reloaded.tasks[0].prompt_tokens == 120 + assert reloaded.tasks[0].completion_tokens == 40 + + +def test_finish_task_failed_path_records_error() -> None: + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="rec-fail" + ) + task = begin_task(api_module._session_mgr, session.id, kind="plan", title="t") + finish_task( + api_module._session_mgr, + session.id, + task, + status="failed", + error="boom: something exploded", + ) + + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.tasks[0].status == "failed" + assert reloaded.tasks[0].error is not None + assert "boom" in reloaded.tasks[0].error + + +def test_begin_task_no_session_id_is_noop() -> None: + """Calling without a session id must not error — older frontends + don't send one, and we must stay byte-identical to today for them.""" + assert begin_task(api_module._session_mgr, None, kind="plan", title="x") is None + # finish_task with no task is also a no-op. + finish_task(api_module._session_mgr, None, None) + + +def test_begin_task_flag_off_is_noop() -> None: + """The kill switch turns off recording entirely.""" + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="rec-flag-off" + ) + flags.set_override(FLAG_TASKS_SIDEBAR, False) + try: + task = begin_task( + api_module._session_mgr, session.id, kind="plan", title="t" + ) + finally: + flags.clear_override(FLAG_TASKS_SIDEBAR) + assert task is None + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.tasks == [] + + +def test_finish_task_preserves_concurrent_writes_to_session() -> None: + """Real-world race: a task is in flight; another endpoint writes a + new branch onto the session; finish_task must not clobber that.""" + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="rec-race" + ) + task = begin_task(api_module._session_mgr, session.id, kind="execute", title="t") + + # Simulate a concurrent write (the execute handler persisting the + # new branch). + concurrent = api_module._session_mgr.load(session.id) + concurrent.branch = "gitpilot-foo-123456" + api_module._session_mgr.save(concurrent) + + finish_task(api_module._session_mgr, session.id, task, status="completed") + + reloaded = api_module._session_mgr.load(session.id) + assert reloaded.branch == "gitpilot-foo-123456" # concurrent write preserved + assert reloaded.tasks[0].status == "completed" + + +# ---------------------------------------------------------------------- +# Read endpoint +# ---------------------------------------------------------------------- + +def test_tasks_endpoint_returns_documented_shape( + client: TestClient, +) -> None: + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="endpoint-shape" + ) + task = begin_task( + api_module._session_mgr, session.id, kind="plan", title="hello world" + ) + finish_task( + api_module._session_mgr, session.id, task, + status="completed", prompt_tokens=12, completion_tokens=4, + ) + + r = client.get(f"/api/sessions/{session.id}/tasks") + assert r.status_code == 200, r.text + body = r.json() + assert body["session_id"] == session.id + assert len(body["tasks"]) == 1 + row = body["tasks"][0] + for key in ( + "id", "kind", "title", "status", + "started_at", "completed_at", "duration_ms", + "prompt_tokens", "completion_tokens", "error", + ): + assert key in row, f"missing key {key}" + assert row["status"] == "completed" + assert row["title"] == "hello world" + + +def test_tasks_endpoint_returns_404_for_unknown_session(client: TestClient) -> None: + r = client.get("/api/sessions/does-not-exist/tasks") + assert r.status_code == 404 + + +def test_tasks_endpoint_returns_404_when_flag_off(client: TestClient) -> None: + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="endpoint-flag" + ) + flags.set_override(FLAG_TASKS_SIDEBAR, False) + try: + r = client.get(f"/api/sessions/{session.id}/tasks") + finally: + flags.clear_override(FLAG_TASKS_SIDEBAR) + assert r.status_code == 404 + + +# ---------------------------------------------------------------------- +# Endpoint wiring — Plan + Execute land a task on the session +# ---------------------------------------------------------------------- + +def test_chat_plan_records_a_task( + client: TestClient, monkeypatch: pytest.MonkeyPatch +) -> None: + """End-to-end: POST /api/chat/plan with a session_id must leave a + completed Task entry on that session.""" + from contextlib import contextmanager + + @contextmanager + def _noop_ctx(*_a, **_kw): + yield + + async def _ok(goal, repo_full_name, token=None, branch_name=None, **_kw): + return {"goal": goal, "summary": "ok", "steps": []} + + monkeypatch.setattr(api_module, "execution_context", _noop_ctx) + monkeypatch.setattr(api_module, "generate_plan", _ok) + monkeypatch.setattr(api_module, "generate_plan_lite", _ok) + monkeypatch.setattr(api_module, "_is_lite_mode_active", lambda: False) + monkeypatch.setattr(api_module, "get_github_token", lambda *_a, **_kw: None) + + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="plan-track" + ) + + r = client.post( + "/api/chat/plan", + json={ + "repo_owner": "o", + "repo_name": "r", + "goal": "Make a thing", + "branch_name": "main", + "session_id": session.id, + }, + ) + assert r.status_code == 200, r.text + + reloaded = api_module._session_mgr.load(session.id) + assert len(reloaded.tasks) == 1 + assert reloaded.tasks[0].kind == "plan" + assert reloaded.tasks[0].status == "completed" + assert reloaded.tasks[0].title == "Make a thing" + + +def test_chat_plan_records_failed_task_on_error( + client: TestClient, monkeypatch: pytest.MonkeyPatch +) -> None: + """Errors from the planner produce a failed Task entry — the trace + must capture failures, not just successes.""" + from contextlib import contextmanager + + @contextmanager + def _noop_ctx(*_a, **_kw): + yield + + async def _bad(goal, repo_full_name, token=None, branch_name=None, **_kw): + raise RuntimeError("completely unrelated boom") + + monkeypatch.setattr(api_module, "execution_context", _noop_ctx) + monkeypatch.setattr(api_module, "generate_plan", _bad) + monkeypatch.setattr(api_module, "generate_plan_lite", _bad) + monkeypatch.setattr(api_module, "_is_lite_mode_active", lambda: False) + monkeypatch.setattr(api_module, "get_github_token", lambda *_a, **_kw: None) + + session = api_module._session_mgr.create( + repo_full_name="o/r", branch="main", name="plan-fail-track" + ) + + r = client.post( + "/api/chat/plan", + json={ + "repo_owner": "o", + "repo_name": "r", + "goal": "Make it fail", + "session_id": session.id, + }, + ) + assert r.status_code >= 400 + + reloaded = api_module._session_mgr.load(session.id) + assert len(reloaded.tasks) == 1 + assert reloaded.tasks[0].status == "failed" + assert reloaded.tasks[0].error is not None + + +# ---------------------------------------------------------------------- +# Session backwards-compatibility +# ---------------------------------------------------------------------- + +def test_session_loads_without_tasks_field(tmp_path) -> None: + """Session files written before this feature must still load.""" + from gitpilot.session import Session + + raw = { + "id": "abc123", + "messages": [], + "checkpoints": [], + "repos": [], + "tasks": [], # explicit empty + } + session = Session.from_dict(raw) + assert session.tasks == []