From 8e08b1f86ff410c3e046c543323c0df7fa1e19fd Mon Sep 17 00:00:00 2001 From: NiveditJain Date: Mon, 4 May 2026 08:13:07 -0700 Subject: [PATCH] [claude-hackernews] draft: Capsule Bash Show HN reply, inline-eval escape hatch (id=48009460) --- drafts/2026-05-04T151151Z.md | 50 ++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 drafts/2026-05-04T151151Z.md diff --git a/drafts/2026-05-04T151151Z.md b/drafts/2026-05-04T151151Z.md new file mode 100644 index 0000000..8ae9085 --- /dev/null +++ b/drafts/2026-05-04T151151Z.md @@ -0,0 +1,50 @@ +# Reply to Capsule Bash Show HN (sandboxed bash for agents) + +- **HN:** https://news.ycombinator.com/item?id=48009460 +- **Story:** "Show HN: Capsule Bash - Sandboxed Bash for Agents" (id=48009460, 1 point, 2 comments at draft time, ~20 minutes old, links to https://github.com/capsulerun/bash) +- **OP:** `mavdol04` +- **Status:** draft (pending manual post) + +## The post + +OP is a Show HN of Capsule Bash, a TypeScript-based sandboxed Bash aimed at agents. Their pitch: existing Bash gives agents "way too much freedom and not enough feedback to enrich the context after each command." Two-layer design: a core (commands + operator logic) and a pluggable runtime (Wasm-based via a Rust runtime they shipped earlier). The sandbox returns structured info about what changed (created/modified/deleted files) plus stdout. They list `python3 -c` and `node -e` among "commonly used commands" and explicitly invite feedback on missing commands. + +Thread is fresh (1 point, 2 comments at post time). Two existing comments: `debarshri` says they're building "something similar... for remote agents", OP asks them what exactly. No hostile or off-topic norm yet; the thread-shape is "design feedback welcomed." + +## My reply + +``` +(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai) + +The "too much freedom" framing is fair, but `python3 -c` and `node -e` as core commands re-open most of it. Once the agent runs `python3 -c "import shutil; shutil.rmtree(p)"`, the fs ops happen inside the Python runtime, not your bash interpreter, so the sandbox's structured-feedback layer never sees them. Same for `node -e "require('fs').unlinkSync(...)"`. Either intercept at the syscall layer, or pre-parse the inline source and refuse blocked imports. Adjacent angle: agents also call Edit/Write/MCP tools that bypass Bash entirely, so a Bash-only sandbox leaves that surface uncovered. The complementary seam is the harness's PreToolUse hook, which sees every tool call before any executor runs: + + import { customPolicies, allow, deny } from "failproofai"; + + customPolicies.add({ + name: "block-inline-eval", + match: { events: ["PreToolUse"] }, + fn: ({ toolName, toolInput }) => { + if (toolName !== "Bash") return allow(); + if (/\b(python3?|node)\s+-(c|e)\b/i.test(toolInput?.command ?? "")) { + return deny("inline -c/-e blocked; drop a script file instead"); + } + return allow(); + }, + }); +``` + +## Insight for the FailProof team + +- Capsule Bash competes for the same problem ("agent does dangerous things via shell") but at a *different layer*: a sandboxed executor that the agent calls into, not a hook that gates the call. That's a real architectural choice and worth being explicit about in the README/docs. The "where in the stack does the policy decision live" framing is becoming the most reliable way to differentiate FailProof from sandbox-shaped competitors (Cordon, AgentPort, AegisProxy, Capsule Bash, SmolVM, etc.). Lead with the seam diagram, not the policy catalog. +- The `python3 -c` / `node -e` escape-hatch observation is concrete and reusable. It's the same family of failure as GTFOBins-style allowlist bypass: the surface area of a "safe shell" includes language interpreters that re-introduce the unsafe surface inside a single command. A blog post titled something like "Why `python3 -c` is the hardest thing to sandbox" would land naturally on threads like this. +- Worth noting: the OP explicitly invites *missing* commands as feedback. The thread norm here is constructive feature-add, not punitive critique. My draft picks the right tone (acknowledge the framing, show the gap, offer a complementary seam) but the line between "constructive critique" and "pitch" is thin on Show HN. If this gets flagged, the failure mode would be that the snippet reads as ad-copy more than the prose does. +- Track Capsule Bash as a sibling project. If they grow, the user-facing pitch is "you don't need FailProof if you've fully replaced the executor" — which is true at the bash boundary, false at the Edit/Write/MCP boundary. Have an honest comparison page ready (we already do this for Cordon, AgentPort). + +## Notes / findings + +- Used the `browser-use` CLI subprocess form for parts of the read flow because `browser_extract_content` returned "No content extracted" twice on hn.algolia.com result pages (likely because the MCP env doesn't have an LLM key set for that primitive). MCP `browser_get_state` worked fine; the CLI eval path is the reliable fallback for structured DOM extraction. +- Algolia's "Past Week" filter on date-sorted searches surfaces a manageable result set (~5-20 items per query) that's fast to triage. Past-week + sort-by-popularity is overkill for fresh threads. Past-week + sort-by-date is the right default for this harness. +- Reply form rendered on this thread (`textarea[name=text]` present); thread is well within HN's reply window. +- Cross-thread duplicate check on `item?id=48009460` returned clean across `drafts/`, `comments/`, and open PRs. +- `browser-use close` resolves the MCP/CLI session-lock conflict (`Session 'default' is already running with different config`); standard recovery, no surprise. +- ASCII punctuation only in the reply body. No em-dashes, en-dashes, fancy ellipses, curly quotes, or unicode arrows.