Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ htmlcov/
# GitHub Copilot agent configs (development only)
.github/agents/

# Claude Code skill files (developer-specific)
.claude/

# Internal research notes (development only)
docs/research/claude-code-vs-meta-harness.md
docs/research/information-bottleneck-hypothesis.md
Expand Down
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [0.2.5] - 2026-06-04

### Added
- **Pi agent backend** (`pi`) — adapter for the minimal open-source
[Pi coding agent](https://github.com/earendil-works/pi) (earendil-works).
Invoked in print mode (`pi -p "<prompt>"`): a single-shot, non-interactive
run that edits files in the workspace and exits. By design Pi has no
permission popups or sandbox/approval flags, so none are passed. Pi
auto-reads `AGENTS.md`, which PolyHarness already injects for this backend.
Wired through the adapter registry, `ph init --agent pi`, the `--backend`
override, ensemble selection, `ph doctor` detection, and the shell-hook
auto-wrap (`pi -p ...`).

## [0.2.4] - 2026-05-26

### Added
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Your AI agent runs the same harness every time. Same prompts, same tool config,
| | |
|---|---|
| **Self-Evolution** | Iteratively searches over harness changes and keeps the full evaluation history in one workspace. |
| **8 Agent Backends** | Claude Code · Claw Code · Codex · Hermes · OpenCode · API direct · OpenAI-compatible · Local — plug in any CLI agent. |
| **9 Agent Backends** | Claude Code · Claw Code · Codex · Hermes · OpenCode · Pi · API direct · OpenAI-compatible · Local — plug in any CLI agent. |
| **Full History** | Every iteration's code, scores, and traces preserved. The Meta-Harness paper reports that non-Markovian search outperforms blind retries. |
| **Search Tree** | Visualize the optimization path. Compare any two candidates with per-task diffs. |
| **One-Command Setup** | `ph init --base-harness ... --task-dir ...` — copies files, configures workspace, done. |
Expand Down Expand Up @@ -235,7 +235,7 @@ PolyHarness automatically sandboxes your agent inside this workspace, ensuring i

| Scenario | How to configure |
|----------|------------------|
| **Supported CLI Tools** | Run `ph init --agent <name>`. PolyHarness auto-injects required instructions (e.g., `CLAUDE.md`).<br>*(Supported: claude-code, claw-code, codex, hermes, opencode)* |
| **Supported CLI Tools** | Run `ph init --agent <name>`. PolyHarness auto-injects required instructions (e.g., `CLAUDE.md`).<br>*(Supported: claude-code, claw-code, codex, hermes, opencode, pi)* |
| **Anthropic API** | Run `ph init --agent api`. Set `export ANTHROPIC_API_KEY="sk-ant-..."` before `ph run`. |
| **OpenAI / Local Models** | Run `ph init --agent openai`. Then configure the endpoint — see [Local Model Setup](#local-model-setup) below. |
| **Custom CLI path** | If your CLI agent uses a non-standard command, edit `config.yaml` in the workspace before running:<br>`proposer: { cli_path: "npx @anthropic-ai/claude-code" }`|
Expand Down Expand Up @@ -306,6 +306,7 @@ ph wrap --auto-evolve claw -p "Write integration tests for payments" # Claw
ph wrap --auto-evolve codex exec "Add retry logic to the API client" # Codex
ph wrap --auto-evolve hermes chat -q "Refactor the DB connection pool" # Hermes Agent
ph wrap --auto-evolve opencode run "Fix the flaky parser test" # OpenCode
ph wrap --auto-evolve pi -p "Tighten the retry/backoff logic" # Pi

# Local models — wrap the CLI command directly
ph wrap --auto-evolve ollama run gemma3 "Summarize this document" # Ollama
Expand Down Expand Up @@ -376,9 +377,10 @@ claw -p "Write payment tests" # same — auto-wrapped
codex exec "Add retry logic" # same
hermes chat -q "Refactor pool" # same
opencode run "Fix flaky test" # same
pi -p "Tighten retry logic" # same
```

How it works: a `preexec` hook in your shell detects `claude`/`claw`/`codex`/`hermes`/`opencode` commands and transparently redirects them through `ph wrap --auto-evolve`. Your output is unchanged.
How it works: a `preexec` hook in your shell detects `claude`/`claw`/`codex`/`hermes`/`opencode`/`pi` commands and transparently redirects them through `ph wrap --auto-evolve`. Your output is unchanged.

```bash
ph shell-hook status # check if installed
Expand Down Expand Up @@ -469,11 +471,12 @@ The Proposer reads **all of this** before generating the next candidate. It can
| `codex` | `codex exec` | OpenAI Codex CLI |
| `hermes` | `hermes chat -q` | Nous Research [Hermes Agent](https://github.com/NousResearch/hermes-agent) CLI |
| `opencode` | `opencode run` | OpenCode CLI |
| `pi` | `pi -p` | Minimal open-source [Pi](https://github.com/earendil-works/pi) coding agent (no permission popups) |
| `local` | — | Offline rule-based engine for development & testing |

`ph doctor` auto-detects all available backends and shows their status.

When you run `ph init --agent claude-code`, PolyHarness automatically generates a `CLAUDE.md` instruction file in the workspace, telling the agent how to behave as an optimization Proposer. Same for `CLAW.md`, `CODEX.md`, `AGENTS.md` (Hermes), `OPENCODE.md` — each agent's native instruction format.
When you run `ph init --agent claude-code`, PolyHarness automatically generates a `CLAUDE.md` instruction file in the workspace, telling the agent how to behave as an optimization Proposer. Same for `CLAW.md`, `CODEX.md`, `AGENTS.md` (Hermes and Pi), `OPENCODE.md` — each agent's native instruction format.

#### Backend ensemble (adaptive selection)

Expand Down Expand Up @@ -540,7 +543,7 @@ search:
seed: null # RNG seed — set an int to make randomized runs reproducible

proposer:
backend: api # api | openai | claude-code | claw-code | codex | hermes | opencode | local
backend: api # api | openai | claude-code | claw-code | codex | hermes | opencode | pi | local
ensemble: [] # If non-empty, pick among these backends per iteration via a UCB bandit
bandit_c: 1.41421356 # UCB exploration constant (higher = more exploration)
model: claude-sonnet-4-6 # Model name (for api/openai backends)
Expand Down Expand Up @@ -645,7 +648,7 @@ python -m polyharness --version
| `ph traces stats` | Summary statistics: total traces, scored count, agent distribution |
| `ph traces clear` | Remove collected traces (`--keep N` to retain newest, `-y` to skip confirm) |
| `ph evolve` | Trigger an online evolution cycle using collected traces as context |
| `ph shell-hook install` | Install shell hook to auto-wrap agent commands (claude, claw, codex, opencode) |
| `ph shell-hook install` | Install shell hook to auto-wrap agent commands (claude, claw, codex, hermes, opencode, pi) |
| `ph shell-hook uninstall` | Remove the shell hook from your rc file |
| `ph shell-hook status` | Check if the shell hook is installed |
| `ph upgrade` | Upgrade PolyHarness to the latest version |
Expand All @@ -661,7 +664,7 @@ python -m polyharness --version
### `ph init` options

```
--agent <name> Backend: claude-code | claw-code | codex | opencode | api | local
--agent <name> Backend: claude-code | claw-code | codex | hermes | opencode | pi | api | local
--workspace <dir> Workspace directory (default: current dir)
--base-harness <dir> Copy starting harness code into workspace
--task-dir <dir> Copy tasks/ folder and evaluate.py into workspace
Expand Down Expand Up @@ -777,7 +780,8 @@ polyharness/
│ │ ├── claw_code.py # claw -p
│ │ ├── codex.py # codex exec
│ │ ├── hermes.py # hermes chat -q
│ │ └── opencode.py # opencode run
│ │ ├── opencode.py # opencode run
│ │ └── pi.py # pi -p
│ └── templates/ # 5 built-in task templates
│ ├── text-classification/
│ ├── math-word-problems/
Expand Down
20 changes: 12 additions & 8 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
| | |
|---|---|
| **自动进化** | 通过迭代搜索探索 harness 变更,并把完整评估历史保存在同一个 workspace 中。 |
| **8 个 Agent 后端** | Claude Code · Claw Code · Codex · Hermes · OpenCode · API 直连 · OpenAI 兼容 · Local,可接入任何 CLI agent。 |
| **9 个 Agent 后端** | Claude Code · Claw Code · Codex · Hermes · OpenCode · Pi · API 直连 · OpenAI 兼容 · Local,可接入任何 CLI agent。 |
| **完整历史** | 每轮迭代的代码、分数、执行轨迹完整保留。Meta-Harness 论文报告非马尔可夫搜索优于盲目重试。 |
| **搜索树** | 可视化优化路径,对比任意两个候选的逐任务差异。 |
| **一条命令完成初始化** | `ph init --base-harness ... --task-dir ...`,复制文件、配置 workspace,一步完成。 |
Expand Down Expand Up @@ -235,7 +235,7 @@ PolyHarness 会通过沙盒编排将你的 Agent 的工作目录(CWD)限制

| 使用场景 | 配置方法 |
|----------|------------------|
| **受原生支持的 CLI Agent 工具** | 使用 `ph init --agent <name>`。系统会自动注入其专属提示词指令(如 `CLAUDE.md`)。<br>*(支持: claude-code, claw-code, codex, hermes, opencode)* |
| **受原生支持的 CLI Agent 工具** | 使用 `ph init --agent <name>`。系统会自动注入其专属提示词指令(如 `CLAUDE.md`)。<br>*(支持: claude-code, claw-code, codex, hermes, opencode, pi)* |
| **Anthropic API 直连** | 使用 `ph init --agent api`。在 `ph run` 前设置 `export ANTHROPIC_API_KEY="sk-ant-..."`。 |
| **OpenAI / 本地模型** | 使用 `ph init --agent openai`。然后配置 endpoint——参见下方 [本地模型配置](#本地模型配置) 章节。 |
| **CLI 命令被自定义 / 路径未响应** | 如果你的 CLI Agent 使用了非标命令(或未设置全局 PATH),请在初始化后手动修改 workspace 根目录下的 `config.yaml`:<br>`proposer: { cli_path: "npx @anthropic-ai/claude-code" }` |
Expand Down Expand Up @@ -306,6 +306,7 @@ ph wrap --auto-evolve claw -p "给支付服务写集成测试" # Claw
ph wrap --auto-evolve codex exec "给 API 客户端加上重试逻辑" # Codex
ph wrap --auto-evolve hermes chat -q "重构数据库连接池" # Hermes Agent
ph wrap --auto-evolve opencode run "修复不稳定的 parser 测试" # OpenCode
ph wrap --auto-evolve pi -p "收紧重试/退避逻辑" # Pi

# 本地模型 —— 直接包裹 CLI 命令
ph wrap --auto-evolve ollama run gemma3 "总结这篇文档" # Ollama
Expand Down Expand Up @@ -376,9 +377,10 @@ claw -p "写支付测试" # 同理——自动包裹
codex exec "加重试逻辑" # 同理
hermes chat -q "重构连接池" # 同理
opencode run "修复不稳定测试" # 同理
pi -p "收紧重试逻辑" # 同理
```

原理:shell 的 `preexec` 钩子检测到 `claude`/`claw`/`codex`/`hermes`/`opencode` 命令后,透明地通过 `ph wrap --auto-evolve` 转发。你的输出不会变。
原理:shell 的 `preexec` 钩子检测到 `claude`/`claw`/`codex`/`hermes`/`opencode`/`pi` 命令后,透明地通过 `ph wrap --auto-evolve` 转发。你的输出不会变。

```bash
ph shell-hook status # 查看是否已安装
Expand Down Expand Up @@ -469,11 +471,12 @@ Proposer 在生成下一个候选之前会读取**所有这些信息**。它能
| `codex` | `codex exec` | OpenAI Codex CLI |
| `hermes` | `hermes chat -q` | Nous Research [Hermes Agent](https://github.com/NousResearch/hermes-agent) CLI |
| `opencode` | `opencode run` | OpenCode CLI |
| `pi` | `pi -p` | 极简开源 [Pi](https://github.com/earendil-works/pi) 编码 agent(无权限弹窗) |
| `local` | — | 离线规则引擎,用于开发和测试 |

`ph doctor` 会自动检测所有可用后端并显示状态。

当你运行 `ph init --agent claude-code` 时,PolyHarness 会在 workspace 中自动生成 `CLAUDE.md` 指令文件,告诉 agent 如何作为优化 Proposer 工作。`CLAW.md`、`CODEX.md`、`AGENTS.md`(Hermes)、`OPENCODE.md` 也是同样的机制,每个 agent 都使用它自己的原生指令格式。
当你运行 `ph init --agent claude-code` 时,PolyHarness 会在 workspace 中自动生成 `CLAUDE.md` 指令文件,告诉 agent 如何作为优化 Proposer 工作。`CLAW.md`、`CODEX.md`、`AGENTS.md`(Hermes 和 Pi)、`OPENCODE.md` 也是同样的机制,每个 agent 都使用它自己的原生指令格式。

#### 后端集成(自适应择优)

Expand Down Expand Up @@ -540,7 +543,7 @@ search:
seed: null # 随机种子 — 设为整数可让带随机性的搜索可复现

proposer:
backend: api # api | openai | claude-code | claw-code | codex | hermes | opencode | local
backend: api # api | openai | claude-code | claw-code | codex | hermes | opencode | pi | local
ensemble: [] # 非空时,每轮用 UCB bandit 在这些后端中择优
bandit_c: 1.41421356 # UCB 探索常数(越大越偏探索)
model: claude-sonnet-4-6 # 模型名称(api/openai 后端使用)
Expand Down Expand Up @@ -645,7 +648,7 @@ python -m polyharness --version
| `ph traces stats` | 汇总统计:总 traces 数、已评分数、各 agent 分布 |
| `ph traces clear` | 清除已收集的 traces(`--keep N` 保留最新、`-y` 跳过确认) |
| `ph evolve` | 基于已收集的 traces 触发一轮在线进化循环 |
| `ph shell-hook install` | 安装 shell 钩子,自动包裹 agent 命令(claude、claw、codex、opencode) |
| `ph shell-hook install` | 安装 shell 钩子,自动包裹 agent 命令(claude、claw、codex、hermes、opencode、pi) |
| `ph shell-hook uninstall` | 从 rc 文件中移除 shell 钩子 |
| `ph shell-hook status` | 检查 shell 钩子是否已安装 |
| `ph upgrade` | 升级 PolyHarness 到最新版本 |
Expand All @@ -661,7 +664,7 @@ python -m polyharness --version
### `ph init` 选项

```
--agent <name> 后端: claude-code | claw-code | codex | opencode | api | local
--agent <name> 后端: claude-code | claw-code | codex | hermes | opencode | pi | api | local
--workspace <dir> Workspace 目录(默认:当前目录)
--base-harness <dir> 将起始 harness 代码复制到 workspace
--task-dir <dir> 将 tasks/ 文件夹和 evaluate.py 复制到 workspace
Expand Down Expand Up @@ -777,7 +780,8 @@ polyharness/
│ │ ├── claw_code.py # claw -p
│ │ ├── codex.py # codex exec
│ │ ├── hermes.py # hermes chat -q
│ │ └── opencode.py # opencode run
│ │ ├── opencode.py # opencode run
│ │ └── pi.py # pi -p
│ └── templates/ # 5 个内置任务模板
│ ├── text-classification/
│ ├── math-word-problems/
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "polyharness",
"version": "0.2.4",
"version": "0.2.5",
"description": "Make your AI agent evolve automatically through iterative harness optimization.",
"keywords": [
"agent",
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "polyharness"
version = "0.2.4"
version = "0.2.5"
description = "Automated harness optimization for AI agents — make your agent evolve."
readme = "README.md"
license = "MIT"
Expand Down
14 changes: 7 additions & 7 deletions src/polyharness/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def new(project_dir: str):
@click.option(
"--agent",
type=click.Choice(
["claude-code", "claw-code", "codex", "hermes", "opencode", "api", "openai", "local"],
["claude-code", "claw-code", "codex", "hermes", "opencode", "pi", "api", "openai", "local"],
case_sensitive=False,
),
default="api",
Expand Down Expand Up @@ -239,7 +239,7 @@ def init(
@click.option(
"--backend",
type=click.Choice(
["api", "openai", "claude-code", "claw-code", "codex", "hermes", "opencode", "local"],
["api", "openai", "claude-code", "claw-code", "codex", "hermes", "opencode", "pi", "local"],
case_sensitive=False,
),
default=None,
Expand Down Expand Up @@ -1867,7 +1867,7 @@ def evolve(workspace: str, store: str | None, max_iterations: int | None):
command -v ph >/dev/null 2>&1 || return
local cmd="$1"
case "$cmd" in
claude\ *|claw\ *|codex\ *|hermes\ *|opencode\ *)
claude\ *|claw\ *|codex\ *|hermes\ *|opencode\ *|pi\ *)
eval "ph wrap --auto-evolve $cmd"
# Return non-zero to prevent original command from running (zsh preexec)
return 1
Expand Down Expand Up @@ -1931,8 +1931,8 @@ def install(rc: str | None):
"""Install shell hook to auto-wrap agent commands.

Adds a preexec hook to your shell rc file so that commands like
`claude -p ...`, `claw -p ...`, `codex exec ...`, `hermes chat -q ...`, `opencode run ...`
are automatically wrapped with `ph wrap --auto-evolve`.
`claude -p ...`, `claw -p ...`, `codex exec ...`, `hermes chat -q ...`, `opencode run ...`,
`pi -p ...` are automatically wrapped with `ph wrap --auto-evolve`.
"""
rc_path = Path(rc) if rc else _detect_shell_rc()

Expand All @@ -1948,7 +1948,7 @@ def install(rc: str | None):
console.print(f"Run [bold]source {rc_path}[/bold] or open a new terminal to activate.")
console.print()
console.print("Agent commands that will be auto-wrapped:")
console.print(" claude, claw, codex, hermes, opencode")
console.print(" claude, claw, codex, hermes, opencode, pi")
console.print()
console.print("To remove: [bold]ph shell-hook uninstall[/bold]")

Expand Down Expand Up @@ -1993,7 +1993,7 @@ def hook_status(rc: str | None):

if _hook_installed(rc_path):
console.print(f"[green]Hook is installed in {rc_path}[/green]")
console.print("Auto-wrapped commands: claude, claw, codex, hermes, opencode")
console.print("Auto-wrapped commands: claude, claw, codex, hermes, opencode, pi")
else:
console.print(f"[yellow]Hook is not installed[/yellow] ({rc_path})")
console.print("Run [bold]ph shell-hook install[/bold] to set it up.")
2 changes: 1 addition & 1 deletion src/polyharness/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# `backend` field and the optional `ensemble` list (which gets validation for
# free by reusing this Literal alias).
BackendName = Literal[
"api", "openai", "claude-code", "claw-code", "codex", "hermes", "opencode", "local"
"api", "openai", "claude-code", "claw-code", "codex", "hermes", "opencode", "pi", "local"
]


Expand Down
2 changes: 2 additions & 0 deletions src/polyharness/doctor.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ def run_doctor() -> None:
click.echo("Recommended: codex (OpenAI agent)")
elif "opencode" in available:
click.echo("Recommended: opencode (open-source agent)")
elif "pi" in available:
click.echo("Recommended: pi (minimal open-source agent)")
elif api_key:
click.echo("Recommended: api (Anthropic API direct)")
else:
Expand Down
3 changes: 3 additions & 0 deletions src/polyharness/proposer/adapters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,15 @@
from polyharness.proposer.adapters.codex import CodexAdapter
from polyharness.proposer.adapters.hermes import HermesAdapter
from polyharness.proposer.adapters.opencode import OpenCodeAdapter
from polyharness.proposer.adapters.pi import PiAdapter

ADAPTER_REGISTRY: dict[str, type[CLIAdapter]] = {
"claude-code": ClaudeCodeAdapter,
"claw-code": ClawCodeAdapter,
"codex": CodexAdapter,
"hermes": HermesAdapter,
"opencode": OpenCodeAdapter,
"pi": PiAdapter,
}


Expand All @@ -42,6 +44,7 @@ def get_adapter(backend: str) -> CLIAdapter:
"CodexAdapter",
"HermesAdapter",
"OpenCodeAdapter",
"PiAdapter",
"ADAPTER_REGISTRY",
"get_adapter",
]
Loading
Loading