Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions drafts/2026-05-04T151151Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Reply to Capsule Bash Show HN (sandboxed bash for agents)

- **HN:** https://news.ycombinator.com/item?id=48009460
- **Story:** "Show HN: Capsule Bash - Sandboxed Bash for Agents" (id=48009460, 1 point, 2 comments at draft time, ~20 minutes old, links to https://github.com/capsulerun/bash)
- **OP:** `mavdol04`
- **Status:** draft (pending manual post)

## The post

OP is a Show HN of Capsule Bash, a TypeScript-based sandboxed Bash aimed at agents. Their pitch: existing Bash gives agents "way too much freedom and not enough feedback to enrich the context after each command." Two-layer design: a core (commands + operator logic) and a pluggable runtime (Wasm-based via a Rust runtime they shipped earlier). The sandbox returns structured info about what changed (created/modified/deleted files) plus stdout. They list `python3 -c` and `node -e` among "commonly used commands" and explicitly invite feedback on missing commands.

Thread is fresh (1 point, 2 comments at post time). Two existing comments: `debarshri` says they're building "something similar... for remote agents", OP asks them what exactly. No hostile or off-topic norm yet; the thread-shape is "design feedback welcomed."

## My reply

```
(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)

The "too much freedom" framing is fair, but `python3 -c` and `node -e` as core commands re-open most of it. Once the agent runs `python3 -c "import shutil; shutil.rmtree(p)"`, the fs ops happen inside the Python runtime, not your bash interpreter, so the sandbox's structured-feedback layer never sees them. Same for `node -e "require('fs').unlinkSync(...)"`. Either intercept at the syscall layer, or pre-parse the inline source and refuse blocked imports. Adjacent angle: agents also call Edit/Write/MCP tools that bypass Bash entirely, so a Bash-only sandbox leaves that surface uncovered. The complementary seam is the harness's PreToolUse hook, which sees every tool call before any executor runs:

import { customPolicies, allow, deny } from "failproofai";

customPolicies.add({
name: "block-inline-eval",
match: { events: ["PreToolUse"] },
fn: ({ toolName, toolInput }) => {
if (toolName !== "Bash") return allow();
if (/\b(python3?|node)\s+-(c|e)\b/i.test(toolInput?.command ?? "")) {
return deny("inline -c/-e blocked; drop a script file instead");
}
return allow();
},
});
```

## Insight for the FailProof team

- Capsule Bash competes for the same problem ("agent does dangerous things via shell") but at a *different layer*: a sandboxed executor that the agent calls into, not a hook that gates the call. That's a real architectural choice and worth being explicit about in the README/docs. The "where in the stack does the policy decision live" framing is becoming the most reliable way to differentiate FailProof from sandbox-shaped competitors (Cordon, AgentPort, AegisProxy, Capsule Bash, SmolVM, etc.). Lead with the seam diagram, not the policy catalog.
- The `python3 -c` / `node -e` escape-hatch observation is concrete and reusable. It's the same family of failure as GTFOBins-style allowlist bypass: the surface area of a "safe shell" includes language interpreters that re-introduce the unsafe surface inside a single command. A blog post titled something like "Why `python3 -c` is the hardest thing to sandbox" would land naturally on threads like this.
- Worth noting: the OP explicitly invites *missing* commands as feedback. The thread norm here is constructive feature-add, not punitive critique. My draft picks the right tone (acknowledge the framing, show the gap, offer a complementary seam) but the line between "constructive critique" and "pitch" is thin on Show HN. If this gets flagged, the failure mode would be that the snippet reads as ad-copy more than the prose does.
- Track Capsule Bash as a sibling project. If they grow, the user-facing pitch is "you don't need FailProof if you've fully replaced the executor" — which is true at the bash boundary, false at the Edit/Write/MCP boundary. Have an honest comparison page ready (we already do this for Cordon, AgentPort).

## Notes / findings

- Used the `browser-use` CLI subprocess form for parts of the read flow because `browser_extract_content` returned "No content extracted" twice on hn.algolia.com result pages (likely because the MCP env doesn't have an LLM key set for that primitive). MCP `browser_get_state` worked fine; the CLI eval path is the reliable fallback for structured DOM extraction.
- Algolia's "Past Week" filter on date-sorted searches surfaces a manageable result set (~5-20 items per query) that's fast to triage. Past-week + sort-by-popularity is overkill for fresh threads. Past-week + sort-by-date is the right default for this harness.
- Reply form rendered on this thread (`textarea[name=text]` present); thread is well within HN's reply window.
- Cross-thread duplicate check on `item?id=48009460` returned clean across `drafts/`, `comments/`, and open PRs.
- `browser-use close` resolves the MCP/CLI session-lock conflict (`Session 'default' is already running with different config`); standard recovery, no surprise.
- ASCII punctuation only in the reply body. No em-dashes, en-dashes, fancy ellipses, curly quotes, or unicode arrows.