Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## 0.0.10-beta.12 — 2026-05-10

### Features
- Pi: enforce `Stop` policies (`require-commit-before-stop`, `require-push-before-stop`, etc.) on the next user turn via `before_agent_start` injection. Pi's `AgentEndEvent` has no Result type — by the time it fires, Pi's agent loop has already exited, so a deny return cannot keep Pi running the way Claude's exit-2-from-Stop can. Empirically observed: a user on Pi with `require-commit-before-stop` enabled saw the deny `reason` ("You have uncommitted changes …") propagate to Pi but Pi exited anyway. Fix: the `pi-extension/index.ts` shim captures any deny `reason` from `agent_end` into a per-`sessionId` in-memory map, then on the next `before_agent_start` (Pi v0.73.x — fires after the user submits a prompt, before the agent loop runs) returns `{systemPrompt: <event.systemPrompt> + "\n\n" + reason}` so the LLM sees a `MANDATORY ACTION REQUIRED` directive at the top of its next turn. The map is one-shot per drain and is cleared on every `session_shutdown` reason (including `quit`), so a stale gate cannot leak into a fresh session started in the same Pi process. `policy-evaluator.ts` now emits the `MANDATORY ACTION REQUIRED from failproofai (policy: …)` wrapper inside `reason` for `cli === "pi" && eventType === "Stop"` (both the deny and instruct paths), mirroring the Cursor/Gemini/Copilot/OpenCode Stop branches; non-Stop Pi events keep the existing flat `{permission, reason}` shape. Bounded by Pi process lifetime — same bound Claude has on exit-2-from-Stop (kill the agent and the gate is missed). New shim unit tests cover capture/drain/one-shot/`session_shutdown`-clear/no-pending-noop/missing-systemPrompt and a new e2e test pins the binary's stdout shape (`{permission:"deny", reason:/MANDATORY ACTION REQUIRED.*require-commit-before-stop.*uncommitted changes/}`) for `agent_end` in a dirty repo (#PR).
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

## 0.0.10-beta.11 — 2026-05-10

### Fixes
Expand Down
21 changes: 11 additions & 10 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,18 +218,19 @@ pi-extension. Same self-reference caveat applies — do **not** install the
standard `npx` form from inside this repo.

**Pi limitations vs. Claude semantics** (verified against pi-coding-agent
v0.72.1 d.ts; the `pi-extension/` shim subscribes to 7 events but Pi's API
v0.73.1 d.ts; the `pi-extension/` shim subscribes to 8 events but Pi's API
caps what each handler can do):

| Pi event | → Claude event | Veto / mutate? | Notes |
|--------------------|------------------|----------------|-------|
| `tool_call` | PreToolUse | ✅ block | Full deny support via `{block, reason}`. |
| `user_bash` | PreToolUse | ✅ block | Full deny support. |
| `input` | UserPromptSubmit | ✅ block | Full deny support. |
| `session_start` | SessionStart | observation | No return-value effect on Pi. |
| `tool_result` | PostToolUse | observation | `ToolResultEventResult` exposes `{content, details, isError}` for mutation but no `block`. PostToolUse is observation/sanitize anyway, matching Claude semantics. |
| `agent_end` | Stop | observation | Pi's agent loop has already exited; we cannot keep Pi running the way Claude's exit-2-from-Stop can. `require-*-before-stop` policies still RUN — their findings land in the activity store + stderr — but the stop is not vetoed. |
| `session_shutdown` | SessionEnd | observation | Symmetry only. |
| Pi event | → Claude event | Veto / mutate? | Notes |
|----------------------|------------------|-----------------|-------|
| `tool_call` | PreToolUse | ✅ block | Full deny support via `{block, reason}`. |
| `user_bash` | PreToolUse | ✅ block | Full deny support. |
| `input` | UserPromptSubmit | ✅ block | Full deny support. |
| `session_start` | SessionStart | observation | No return-value effect on Pi. |
| `tool_result` | PostToolUse | observation | `ToolResultEventResult` exposes `{content, details, isError}` for mutation but no `block`. PostToolUse is observation/sanitize anyway, matching Claude semantics. |
| `agent_end` | Stop | shifted (next turn) | Pi's `AgentEndEvent` has no Result type — we cannot retry the same loop the way Claude's exit-2-from-Stop can. The shim captures any deny `reason` and stashes it keyed by sessionId for the next `before_agent_start` handler to drain. The 5 `require-*-before-stop` builtins thus enforce by gating the NEXT user turn's system prompt. Bounded by Pi process lifetime — same bound Claude has on exit-2-from-Stop. |
| `before_agent_start` | (Pi-only handoff) | systemPrompt | Drains any pending Stop deny captured at `agent_end`, returning `{systemPrompt: <event.systemPrompt> + "\n\n" + reason}` so the LLM sees the MANDATORY ACTION directive before its next turn. Multiple extensions chain. No injection when no block is pending. |
| `session_shutdown` | SessionEnd | observation | Symmetry only. Also clears any pending stop-block for the session id (every reason, not just `new`/`resume`/`fork`). |

**Instruct (`additionalContext`) on Pi `tool_call`** — Pi's
`ToolCallEventResult` shape is `{block?, reason?}` only; there's no
Expand Down
33 changes: 33 additions & 0 deletions __tests__/e2e/hooks/pi-integration.e2e.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,39 @@ describe("E2E: Pi integration — hook protocol (handler-only)", () => {
}
});

it("agent_end with require-commit-before-stop in a dirty repo emits MANDATORY ACTION reason", () => {
const env = createPiEnv();
try {
// Make env.cwd a git repo with an uncommitted file so
// require-commit-before-stop returns deny. This is the bridge to the
// shim-side handoff: the binary's stdout MUST be a Pi-flat
// `{permission:"deny", reason}` payload whose reason carries the
// "MANDATORY ACTION REQUIRED" wrapper. The shim captures that
// reason on agent_end and re-injects it via before_agent_start (the
// in-process map handoff is covered by the shim unit tests).
execSync("git init -q && git config user.email t@t && git config user.name t", { cwd: env.cwd });
writeFileSync(resolve(env.cwd, "dirty.txt"), "uncommitted\n");
writeConfig(env.cwd, ["require-commit-before-stop"]);
const result = runHook(
"agent_end",
PiPayloads.agentEnd(env.cwd),
{ homeDir: env.home, cli: "pi" },
);
// Stop on Pi uses the MANDATORY ACTION wrapping (not the generic
// `Blocked <displayTool> by failproofai because: …` wording that
// assertPiDeny matches), so we inline the deny shape checks here.
expect(result.exitCode).toBe(0);
expect(result.parsed?.permission).toBe("deny");
expect(result.parsed?.hookSpecificOutput).toBeUndefined();
const reason = String(result.parsed?.reason ?? "");
expect(reason).toMatch(/MANDATORY ACTION REQUIRED/);
expect(reason).toMatch(/require-commit-before-stop/);
expect(reason).toMatch(/uncommitted changes/i);
} finally {
env.cleanup();
}
});

it("agent-settings guard: Bash read of .pi/settings.json is denied", () => {
const env = createPiEnv();
try {
Expand Down
134 changes: 133 additions & 1 deletion __tests__/hooks/pi-extension-shim.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,23 @@ interface PiExtensionApi {

const captured: CapturedCall[] = [];

/** Per-event stdout reply queue for the spawnSync mock. Tests that need
* to simulate a binary deny set `mockSpawnReplyByEvent[<eventName>]`
* to a JSON string before invoking the matching handler. Any event not
* in the map gets the default empty stdout. */
const mockSpawnReplyByEvent: Record<string, string | undefined> = {};

function eventNameFromArgs(args: string[]): string | undefined {
const i = args.indexOf("--hook");
return i >= 0 ? args[i + 1] : undefined;
}

vi.mock("node:child_process", () => ({
spawnSync: (_cmd: string, args: string[], opts: { input?: string }) => {
captured.push({ args: args ?? [], payload: JSON.parse(opts?.input ?? "{}") });
return { pid: 0, output: [], status: 0, signal: null, stderr: "", stdout: "" };
const evt = eventNameFromArgs(args ?? []);
const stdout = (evt && mockSpawnReplyByEvent[evt]) ?? "";
return { pid: 0, output: [], status: 0, signal: null, stderr: "", stdout };
},
}));

Expand All @@ -43,6 +56,7 @@ describe("pi-extension shim — sessionId resolution via on-disk discovery", ()

beforeEach(async () => {
captured.length = 0;
for (const k of Object.keys(mockSpawnReplyByEvent)) delete mockSpawnReplyByEvent[k];
handlers = {};
piRoot = mkdtempSync(join(tmpdir(), "pi-shim-test-"));
originalEnv = process.env.PI_SESSIONS_DIR;
Expand Down Expand Up @@ -240,4 +254,122 @@ describe("pi-extension shim — sessionId resolution via on-disk discovery", ()
});
});

/**
* Pi cannot veto `agent_end` directly (Pi's AgentEndEvent has no Result type).
* The shim captures any deny reason and re-injects it as a `systemPrompt`
* suffix on the next `before_agent_start`. These tests cover that handoff.
*/
describe("pi-extension shim — agent_end → before_agent_start stop-block handoff", () => {
let handlers: Record<string, (event: unknown) => unknown> = {};
let piRoot: string;
let originalEnv: string | undefined;
const SID = "ffffffff-ffff-ffff-ffff-ffffffffffff";

beforeEach(async () => {
captured.length = 0;
for (const k of Object.keys(mockSpawnReplyByEvent)) delete mockSpawnReplyByEvent[k];
handlers = {};
piRoot = mkdtempSync(join(tmpdir(), "pi-shim-handoff-"));
originalEnv = process.env.PI_SESSIONS_DIR;
process.env.PI_SESSIONS_DIR = piRoot;
// Seed a transcript so resolveSessionId returns a stable id.
const dir = join(piRoot, piEncodeCwd("/proj"));
mkdirSync(dir, { recursive: true });
writeFileSync(join(dir, `2026-05-09T00-00-00-000Z_${SID}.jsonl`), "{}\n");
vi.resetModules();
const mod = await import("../../pi-extension/index");
mod.default({ on: (name, fn) => { handlers[name] = fn; } });
});

afterEach(() => {
if (originalEnv === undefined) delete process.env.PI_SESSIONS_DIR;
else process.env.PI_SESSIONS_DIR = originalEnv;
rmSync(piRoot, { recursive: true, force: true });
});

it("agent_end deny is captured and drained on next before_agent_start as a systemPrompt suffix", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({
permission: "deny",
reason: "MANDATORY ACTION REQUIRED from failproofai (policy: require-commit-before-stop): commit now.",
});
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
// No reply value from agent_end (Pi cannot veto stop).
const result = handlers.before_agent_start({
type: "before_agent_start",
prompt: "next prompt",
systemPrompt: "BASE",
cwd: "/proj",
}) as { systemPrompt?: string } | undefined;
expect(result?.systemPrompt).toBe(
"BASE\n\nMANDATORY ACTION REQUIRED from failproofai (policy: require-commit-before-stop): commit now.",
);
});

it("before_agent_start with no pending block returns undefined", () => {
const result = handlers.before_agent_start({
type: "before_agent_start",
prompt: "p",
systemPrompt: "BASE",
cwd: "/proj",
});
expect(result).toBeUndefined();
});

it("the stop-block is one-shot: a second before_agent_start in the same session does not re-fire", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({ permission: "deny", reason: "X" });
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
const first = handlers.before_agent_start({ type: "before_agent_start", systemPrompt: "B", cwd: "/proj" }) as { systemPrompt?: string };
expect(first?.systemPrompt).toBe("B\n\nX");
const second = handlers.before_agent_start({ type: "before_agent_start", systemPrompt: "B", cwd: "/proj" });
expect(second).toBeUndefined();
});

it("session_shutdown clears the pending stop-block (quit reason too, not just new/resume/fork)", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({ permission: "deny", reason: "X" });
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
handlers.session_shutdown({ type: "session_shutdown", reason: "quit", cwd: "/proj" });
// Even though `quit` retains the cached sessionId, the pending block must
// be dropped so a future before_agent_start (e.g. in the next session
// started in this process) doesn't inherit a stale gate.
const result = handlers.before_agent_start({
type: "before_agent_start",
systemPrompt: "B",
cwd: "/proj",
});
expect(result).toBeUndefined();
});

it("agent_end with allow stdout (empty reason) does NOT set a pending block", () => {
// Default mock returns empty stdout → callPolicy returns {block:false}.
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
const result = handlers.before_agent_start({
type: "before_agent_start",
systemPrompt: "B",
cwd: "/proj",
});
expect(result).toBeUndefined();
});

it("before_agent_start without a resolvable sessionId is a no-op", () => {
// Use a cwd that has no on-disk transcript — sessionId discovery returns
// undefined and the handler must early-return without throwing.
const result = handlers.before_agent_start({
type: "before_agent_start",
systemPrompt: "B",
cwd: "/no-such-cwd",
});
expect(result).toBeUndefined();
});

it("before_agent_start with no systemPrompt in the event still injects (uses empty base)", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({ permission: "deny", reason: "Y" });
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
const result = handlers.before_agent_start({
type: "before_agent_start",
cwd: "/proj",
}) as { systemPrompt?: string };
expect(result?.systemPrompt).toBe("\n\nY");
});
});

import { afterEach } from "vitest";
85 changes: 77 additions & 8 deletions pi-extension/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,24 @@ function discoverPiSessionId(cwd: string): string | undefined {
* across multiple workspace roots) can't cross-attribute. Cleared on
* session_shutdown reasons `new`/`resume`/`fork` (Pi reuses the process). */
const cachedSessionIdByCwd = new Map<string, string>();

/** Pending Stop-policy deny reason from agent_end, keyed by sessionId.
* Drained by before_agent_start on the next user turn in the same Pi
* process. Cleared on every session_shutdown.
*
* Why this exists: Pi's agent_end has no Result type — the agent loop
* has already exited when it fires, so a deny return cannot keep Pi
* running the way Claude's exit-2-from-Stop does. The closest analog
* is to capture the deny here and re-inject it as a MANDATORY ACTION
* system-prompt addition on the NEXT before_agent_start, which fires
* after the user submits a prompt but before the agent loop runs.
* Best-effort: bounded by the Pi process lifetime — same bound Claude
* has on exit-2-from-Stop (kill the agent and the gate is missed).
*
* Why per-session not per-cwd: a Pi process can host multiple sessions
* via /resume and /fork; per-cwd would cross-attribute a stale block
* from a prior session into a fresh one. */
const pendingStopBlockBySession = new Map<string, string>();
function resolveSessionId(eventSessionId: string | undefined, cwd: string): string | undefined {
if (eventSessionId) {
cachedSessionIdByCwd.set(cwd, eventSessionId);
Expand Down Expand Up @@ -302,6 +320,17 @@ interface PiAgentEndEvent {
sessionId?: string;
}

/** Pi v0.73.x before_agent_start event payload. Fires once per turn,
* after the user submits a prompt but before the agent loop runs. */
interface PiBeforeAgentStartEvent {
type?: string;
prompt?: string;
/** The fully assembled system prompt for this turn — we append to it. */
systemPrompt?: string;
cwd?: string;
sessionId?: string;
}

interface PiExtensionApi {
on(event: string, handler: (event: unknown) => unknown): void;
}
Expand Down Expand Up @@ -384,21 +413,51 @@ export default function failproofaiBridge(pi: PiExtensionApi) {
return undefined;
});

// agent_end → Stop. Observation-only on Pi: the agent loop has already
// exited when this fires, so a deny decision cannot keep Pi running the
// way Claude's exit-2-from-Stop can. We still forward so the 5
// require-*-before-stop builtins run and log their findings (visible in
// the dashboard's activity feed and stderr) — best-effort visibility.
// agent_end → Stop. Pi cannot veto agent_end (the agent loop has already
// exited when this fires — see the AgentEndEvent typedef in pi-coding-agent
// which has NO Result type). Instead we capture any deny reason and stash
// it keyed by sessionId for the next before_agent_start handler to drain.
// The 5 require-*-before-stop builtins thus enforce by gating the NEXT
// user turn's system prompt rather than by retrying the same loop. If the
// user kills Pi between turns, the gate is missed — same bound Claude has.
pi.on("agent_end", (event: unknown): unknown => {
const e = event as PiAgentEndEvent;
callPolicy("agent_end", {
session_id: resolveSessionId(e.sessionId, resolveCwd(e.cwd)),
cwd: resolveCwd(e.cwd),
const cwd = resolveCwd(e.cwd);
const sessionId = resolveSessionId(e.sessionId, cwd);
const decision = callPolicy("agent_end", {
session_id: sessionId,
cwd,
hook_event_name: "Stop",
});
if (decision.block && decision.reason && sessionId) {
pendingStopBlockBySession.set(sessionId, decision.reason);
debug(`agent_end deny stored for session=${sessionId}`);
}
return undefined;
});

// before_agent_start → drain any pending Stop-policy deny captured at
// agent_end. This is Pi's only first-class channel to influence the next
// turn before the LLM call: the result type accepts a `systemPrompt`
// replacement (chained across extensions) and an optional injected
// CustomMessage. We only return systemPrompt — sufficient for the LLM to
// see the MANDATORY ACTION directive immediately, and avoids polluting the
// visible conversation history with framework chrome. The reason text
// already carries the policy-attributed MANDATORY ACTION wording from
// policy-evaluator's Pi-Stop branch.
pi.on("before_agent_start", (event: unknown): unknown => {
const e = event as PiBeforeAgentStartEvent;
const cwd = resolveCwd(e.cwd);
const sessionId = resolveSessionId(e.sessionId, cwd);
if (!sessionId) return undefined;
const pending = pendingStopBlockBySession.get(sessionId);
if (!pending) return undefined;
pendingStopBlockBySession.delete(sessionId);
debug(`before_agent_start drains stop-block for session=${sessionId}`);
const base = e.systemPrompt ?? "";
return { systemPrompt: `${base}\n\n${pending}` };
});

// session_shutdown → SessionEnd. Observation-only; emits a SessionEnd
// record so per-session telemetry has a clean close. Reset the per-cwd
// sessionId cache for shutdown reasons that mean "Pi is starting a new
Expand All @@ -414,9 +473,19 @@ export default function failproofaiBridge(pi: PiExtensionApi) {
reason: e.reason,
hook_event_name: "SessionEnd",
});
// Capture sessionId BEFORE the cache reset so we delete the pending
// entry under the just-ending session's id. After resetSessionIdCache,
// a subsequent resolveSessionId would re-discover from disk and could
// bind to a different (stale) file — wrong key for the cleanup below.
const sessionId = resolveSessionId(e.sessionId, cwd);
if (e.reason === "new" || e.reason === "resume" || e.reason === "fork") {
resetSessionIdCache(cwd);
}
// Drop any pending Stop-policy deny for this session on every shutdown
// reason — `quit` ends the session for good (don't leak the entry into
// GC); `new`/`resume`/`fork` start a different session in the same
// process and must not inherit the prior session's gate.
if (sessionId) pendingStopBlockBySession.delete(sessionId);
return undefined;
});
}
Loading