oc-repl

一个体感接近 Codex / Claude Code 的交互式 REPL，用来跑学术界主流的 terminal-agent 模型，专门为 手动试用 + 录制 demo 设计（不追求那两个产品的极致体验，覆盖常用协议就够）。

支持 4 套 on-the-wire 协议：

`--protocol`	来源	模型一次性输出
`camel-terminal-toolkit`（默认）	camel-ai `ChatAgent` + `TerminalToolkit` —— `HansBug/OpenClaw-RL` ckpt 的实际训练分布	OpenAI `tool_calls`，调 camel 的 4 个函数（`shell_exec(id, command, block, timeout)` / `shell_view(id)` / `shell_write_to_process(id, command)` / `shell_write_content_to_file(content, file_path)`）；system prompt 是训练时 `get_developer_agent_prompt(...)` 的整段拷贝，结尾追加 `/no_think`
`openai-tools`	手写的简化版（一个 `shell(command)` + 一个 `write_file(path, content)`）	结构化 OpenAI `tool_calls`，临时 demo 方便但跟训练分布对不上
`terminus-json`	`terminal_bench/agents/prompt-templates/terminus-json-plain.txt` —— `tb run --agent terminus-2` 评测时用	一个 JSON 对象/轮：`{analysis, plan, commands[].keystrokes, task_complete}`
`terminus-xml`	`terminal_bench/agents/prompt-templates/terminus-xml-plain.txt` —— 同上的 XML 变体	`<response><analysis/><plan/><commands><keystrokes>…</keystrokes></commands><task_complete>…</task_complete></response>`

为什么写这个工具：camel-ai 和 terminal-bench 都没有官方交互式 REPL——camel 是 Python API + Gradio Web UI，terminal-bench 是 tb run 一次性 CLI。我们想要一个 Codex 风格的 chat 界面去手动试用 + 录 demo HansBug/OpenClaw-RL 训出来的 qwen3-8b-rl-iter215，所以自己写了 oc-repl。任何能说上面 4 套协议之一的模型都能用。

这个 ckpt 的「母语」是哪个协议？

对 HansBug/OpenClaw-RL 训出来的所有 ckpt（qwen3-8b-rl-*），答案永远是 camel-terminal-toolkit——这才是 RL rollout 实际用到的 on-the-wire surface。证据：

run msp60ius（生成 iter215 的那次）的启动命令里没有 --terminal-agent-type，所以 generate.py:142 的 getattr(args, "terminal_agent_type", "camel_agent") 落到 "camel_agent"。
rollout_agent.py:97-106 只接受这一个 value，传别的直接 raise ValueError ——仓库根本没有 terminus 路径。
agent/camel_agent.py 继承 camel.agents.ChatAgent；remote/terminal_env.py:163-185 把 camel.toolkits.TerminalToolkit 的 4 个函数封装成 tool_schemas 喂给 rollout。
sglang 端用 tool_call_parser: qwen25（见 configs/rollout_qwen3.yaml）把模型的 <tool_call>{…}</tool_call> markup 还原成结构化 tool_calls。
训练 rollout log 里：<tool_call> 出现 628 次，terminus-2 的 JSON key "task_complete" 出现 0 次。

Terminus-2 只在评测时用（tb run --agent terminus-2），完全不进 RL rollout。所以 --protocol terminus-json 是「这个模型能被诱导着说的另一种协议」，不是「它训练时用的协议」。

主要特性

一行启动：oc-repl --sandbox docker:openclaw-fixperm，进 REPL。
4 套协议 + 单一 UI：--protocol {camel-terminal-toolkit, openai-tools, terminus-json, terminus-xml}，渲染、sandbox、engine 完全共享。
Thinking 默认折叠为 thinking… 3.2s · 184 chars spinner。--show-thinking 或 /think 切换。
Tool call 用面板渲染：跟 Codex 类似——标题栏命令、缩进 stdout、绿底 exit 0 / 红底 exit N。Stdout/stderr 自动收紧到 ≤10 行 × ≤180 字符 + … (N more lines clipped) 提示，避免 noisy 命令刷屏。
oc-repl exec "<task>" 非交互模式，对标 codex exec。--json 输出机器可读结果。
可选 post-agent 评分钩子：--verify "<bash cmd>" 或 --verify-file path/to/check.sh ——agent 自报 done 之后跑评分脚本，用绿/红面板展示 ✓/✗ 结果。专门为「agent 自报 task_complete ≠ 任务真的做对了」这种情况设计的。
极简依赖：rich + 标准库。chat completions 走原始 SSE，不用 openai SDK，不依赖 async runtime。

安装

git clone https://github.com/HansBug/oc-repl
cd oc-repl
pip install -e .

Python ≥ 3.9，并且要么有 docker daemon 可访问（推荐 sandbox 模式），要么接受 --sandbox local 在 host shell 跑命令（危险，仅供开发）。

Quick start（对着 `qwen3-8b-rl-iter215`）

假设已经按 HansBug/OpenClaw-RL issue #13 起好了 sglang，以及对应 task 的长寿容器：

# 1. （一次性）拉起 sandbox 容器——跟 terminal-bench 用的镜像一致
docker run -d --name openclaw-fixperm -w /app tb__fix-permissions__client sleep infinity

# 2. 交互式 REPL
oc-repl \
    --api-base http://127.0.0.1:30000/v1 \
    --model qwen3-8b-rl-iter215 \
    --sandbox docker:openclaw-fixperm
# 默认 --protocol camel-terminal-toolkit 已经对齐训练分布，不需要显式指定

# 3. 或一次性 exec
oc-repl exec \
    --sandbox docker:openclaw-fixperm \
    "Fix /app/process_data.sh so it can run, then run it once."

交互画面大概长这样：

╭─ oc-repl ──────────────────────────────────────────────────────╮
│ model     qwen3-8b-rl-iter215                                  │
│ endpoint  http://127.0.0.1:30000/v1                            │
│ sandbox   docker:openclaw-fixperm                              │
│ protocol  camel-terminal-toolkit                               │
│ thinking  hidden (use /think to toggle)                        │
│                                                                │
│ Type a task. /help for commands. Ctrl+D to exit.               │
╰────────────────────────────────────────────────────────────────╯

 › Fix /app/process_data.sh so it can run, then run it.

running task …
╭─  ▸ shell ────────────────────╮
│ chmod +x /app/process_data.sh │
│  exit 0                       │
╰───────────────────────────────╯
╭─  ▸ shell ───────────────────╮
│ /app/process_data.sh         │
│ Data processed successfully! │
│  exit 0                      │
╰──────────────────────────────╯
 ✓ rounds=2  commands=2  task_complete=True

命令清单

Slash 指令（交互模式）

指令	作用
`/help`	列出指令
`/think`	切换 `<think>` 是否显示
`/reset`	清空对话历史（sandbox 容器状态不变）
`/quit`（或 `Ctrl-D`）	退出

命令行参数

--api-base URL          OpenAI-compatible 端点（默认 http://127.0.0.1:30000/v1）
--api-key KEY           默认 sk-dummy（sglang 不校验）
--model NAME            服务名；默认 qwen3-8b-rl-iter215
--protocol P            camel-terminal-toolkit（默认 —— 跟 OpenClaw-RL 训练分布对齐）
                        | openai-tools | terminus-json | terminus-xml | auto
--sandbox SPEC          local（默认 —— 危险，跑在 host shell 上）
                        或 docker:CONTAINER（exec 进一个已经在跑的容器）
--show-thinking         流式打印 <think>，不再折叠为 spinner
--temperature 0.2       默认 0.2
--max-tokens 4096       默认 4096
--cmd-timeout 30        每条命令的 sandbox 超时（秒）
--verify "CMD"          agent 跑完后在 sandbox 跑一段 bash 评分。退出 0 → ✓ verified；非零 → ✗ failed
--verify-file PATH      同上，但脚本从本地文件拷进 sandbox 再跑

省略的参数从环境变量取（OPENAI_API_BASE、OPENAI_API_KEY、OC_REPL_MODEL、OC_REPL_PROTOCOL、OC_REPL_SANDBOX）。

非交互模式（`exec`）

oc-repl exec [shared flags] [--json] "<task>"

跑一轮，打印 Codex 风格 trace，退出。退出码：

code	含义
`0`	`--verify[-file]` 通过；或没有 verify hook 且 agent 自报 `task_complete=true`
`2`	没有 verify hook 且 agent 没自报 `task_complete`
`3`	verify hook 跑了但失败（agent 自称完成、但客观检查打脸）

--json 把末尾换成 JSON dump：

{
  "instruction": "...",
  "rounds": 2,
  "commands": 4,
  "task_complete": true,
  "last_summary": "...",
  "verify": {
    "passed": true,
    "returncode": 0,
    "output_tail": "OK: /app/recovered/credentials.txt\nPASS — all checks ok"
  }
}

Verify 钩子示例

recover-obfuscated-files 任务要求把两个文件恢复到 /app/recovered/credentials.txt 和 /app/recovered/project_alpha.log，内容也得对。一个简单的 verify 脚本：

#!/usr/bin/env bash
set -e
errors=0
check() { [ -f "$1" ] && grep -qF "$2" "$1" || { echo "BAD: $1"; errors=$((errors+1)); }; }
check /app/recovered/credentials.txt 'P0$$wOrd123!'
check /app/recovered/project_alpha.log 'Log entry 1: System initialized.'
[ "$errors" -eq 0 ] && { echo PASS; exit 0; } || { echo "FAIL ($errors)"; exit 1; }

oc-repl exec --sandbox docker:openclaw-recover \
    --verify-file check_recover.sh \
    "Decode each *.b64_content in /app/sensitive_data/ — basename and content are both base64. Restore them into /app/recovered/."

输出末尾要么：

╭─  ✓ verified  ─────────────────────────────╮
│ bash check_recover.sh                      │
│ OK: /app/recovered/credentials.txt         │
│ OK: /app/recovered/project_alpha.log       │
│ PASS                                       │
╰────────────────────────────────────────────╯
✓ rounds=2  commands=2  task_complete=True  verify=✓

要么（agent 中途偷懒）：

╭─  ✗ verification failed (exit 1)  ─────────╮
│ bash check_recover.sh                      │
│ BAD: /app/recovered/credentials.txt        │
│ FAIL (1)                                   │
╰────────────────────────────────────────────╯
✓ rounds=1  commands=2  task_complete=True  verify=✗ (exit 1)

第二种情况就是这个钩子存在的全部原因：agent 自报 task_complete=true 是它自己的话，verify 才告诉你世界真的变成了用户想要的样子没有。

协议细节

`camel-terminal-toolkit`（默认）

对齐 HansBug/OpenClaw-RL 训练时的 on-the-wire 协议：

4 个工具的 schema 从 camel-ai==0.2.90 的 FunctionTool.get_openai_tool_schema() 抓出来 hardcode，包括 ["boolean", "null"] union 类型 + "strict": true 等训练时模型见过的 JSON-schema 细节。
System prompt 是 terminal-rl/agent/camel_agent.py::get_developer_agent_prompt(...) 在 system='Linux (in Docker)', machine='x86_64', is_workforce=False, non_think_mode=True 配置下产出的整段，结尾带 /no_think。
/no_think 是关键：训练时 non_think_mode=True 是默认值（rollout_agent.py:62），模型 RL 阶段从没生成过 <think> 块；如果让 thinking 开着，tool-call adherence 会塌（亲测：recover 任务里模型写 1500+ token prose 然后 0 个 tool_call）。
Sandbox 映射：shell_exec 全保真；shell_write_content_to_file 用 here-doc 100% 保真；shell_view / shell_write_to_process 因为 oc-repl sandbox 是 stateless docker exec（不维护 per-id tmux session），降级为 stub 输出。

`openai-tools`（简化版）

只暴露两个工具：shell(command)、write_file(path, content)。比 camel-terminal-toolkit 简单，但跟训练分布不一致。临时 demo 想要"轻一点"的协议可以用，正式跟 qwen3-8b-rl-iter215 这个 ckpt 比对 adherence 要用 camel-terminal-toolkit。

`terminus-json` / `terminus-xml`

tb run --agent terminus-2 用的协议，模型一次性吐 {analysis, plan, commands[], task_complete} JSON（或 XML 变体）。oc-repl 这两个 protocol 解析模型输出，按 commands[].keystrokes 在 sandbox 跑命令。

commands[].keystrokes 是逐字发到 shell 的——但 C-c、C-d 这种 tmux 特殊键序在 oc-repl 的 docker-exec sandbox 里不会真的转发为信号，因为我们没维护活的 tmux session。需要那种保真度就回去用 tb run --agent terminus-2。

qwen3-8b-rl-iter215 没有 RL-trained 在 terminus-* 协议上，所以这两个 protocol 在这个 ckpt 上的 adherence 只是底座模型的 JSON / XML 模仿能力，跟训练分布无关。

Sandbox

local              在 host shell 上跑。默认值——危险。
                   banner 会用黄色高亮 sandbox 行作为提醒。
docker:NAME        exec 进一个**已经在跑**的容器。demo 推荐这种。
                   容器得 user 自己 `docker run -d`，oc-repl 只 attach 不会 spawn。

docker: 模式自动检测当前 shell 是否在 docker group，不在就自动 wrap 每条命令为 sg docker -c '...'。

为啥要求容器预先起好？

让多轮 REPL 累积 sandbox 状态：agent 上一轮 cd 了的 dir、起的后台进程、写的临时文件，下一轮都还在。
跳过每条命令 1-3 秒的容器启动开销，录像时节奏更顺。
让用户自带镜像——tb__hello-world__client、ubuntu:22.04、项目自己的 dev env 都行，oc-repl 不假设镜像内容。

仓库结构

src/oc_repl/
├── cli.py             argparse 入口，对外暴露 `oc-repl` 命令
├── client.py          原始 SSE 实现的流式 OpenAI-compatible chat 客户端（不依赖 SDK）
├── engine.py          REPL + exec 模式共享的 turn-execution 循环
├── oneshot.py         `oc-repl exec` 实现
├── repl.py            交互模式（banner + `›` 提示 + slash 指令）
├── sandbox.py         docker / local 两个 sandbox 后端
├── ui.py              基于 rich 的 banner / thinking-folder / tool-block / 提示符
└── protocols/
    ├── base.py        Protocol + ParsedTurn + Command dataclass
    ├── camel_terminal_toolkit.py    camel-ai 训练分布字节级复刻（默认）
    ├── openai_tools.py              简化版 1-tool 变体
    ├── terminus_json.py             terminus-2 JSON
    └── terminus_xml.py              terminus-2 XML

当前的边界

没有历史搜索、自动补全、/ 指令模糊匹配。这是个手动 probe + 录 demo 用的工具，不是 Codex / Claude Code 替代品。
没有真的 tmux 后端 —— terminus 的 keystrokes 在 oc-repl 里失去交互 TUI 语义（vim / less 不能玩）。
协议里不支持多 tool 之间的高级编排（除了协议本身定义的那些）。
terminus protocol 的 commands[].keystrokes 在 oc-repl 是按 bash 命令执行的，不进 tmux pipe。

License

MIT，详见 LICENSE。

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/oc_repl		src/oc_repl
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oc-repl

这个 ckpt 的「母语」是哪个协议？

主要特性

安装

Quick start（对着 `qwen3-8b-rl-iter215`）

命令清单

Slash 指令（交互模式）

命令行参数

非交互模式（`exec`）

Verify 钩子示例

协议细节

`camel-terminal-toolkit`（默认）

`openai-tools`（简化版）

`terminus-json` / `terminus-xml`

Sandbox

仓库结构

当前的边界

License

相关项目

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

oc-repl

这个 ckpt 的「母语」是哪个协议？

主要特性

安装

Quick start（对着 qwen3-8b-rl-iter215）

命令清单

Slash 指令（交互模式）

命令行参数

非交互模式（exec）

Verify 钩子示例

协议细节

camel-terminal-toolkit（默认）

openai-tools（简化版）

terminus-json / terminus-xml

Sandbox

仓库结构

当前的边界

License

相关项目

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Quick start（对着 `qwen3-8b-rl-iter215`）

非交互模式（`exec`）

`camel-terminal-toolkit`（默认）

`openai-tools`（简化版）

`terminus-json` / `terminus-xml`

Packages