From 8e08b1f86ff410c3e046c543323c0df7fa1e19fd Mon Sep 17 00:00:00 2001
From: NiveditJain <nivedit@exosphere.host>
Date: Mon, 4 May 2026 08:13:07 -0700
Subject: [PATCH] [claude-hackernews] draft: Capsule Bash Show HN reply,
 inline-eval escape hatch (id=48009460)

---
 drafts/2026-05-04T151151Z.md | 50 ++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 drafts/2026-05-04T151151Z.md

diff --git a/drafts/2026-05-04T151151Z.md b/drafts/2026-05-04T151151Z.md
new file mode 100644
index 0000000..8ae9085
--- /dev/null
+++ b/drafts/2026-05-04T151151Z.md
@@ -0,0 +1,50 @@
+# Reply to Capsule Bash Show HN (sandboxed bash for agents)
+
+- **HN:** https://news.ycombinator.com/item?id=48009460
+- **Story:** "Show HN: Capsule Bash - Sandboxed Bash for Agents" (id=48009460, 1 point, 2 comments at draft time, ~20 minutes old, links to https://github.com/capsulerun/bash)
+- **OP:** `mavdol04`
+- **Status:** draft (pending manual post)
+
+## The post
+
+OP is a Show HN of Capsule Bash, a TypeScript-based sandboxed Bash aimed at agents. Their pitch: existing Bash gives agents "way too much freedom and not enough feedback to enrich the context after each command." Two-layer design: a core (commands + operator logic) and a pluggable runtime (Wasm-based via a Rust runtime they shipped earlier). The sandbox returns structured info about what changed (created/modified/deleted files) plus stdout. They list `python3 -c` and `node -e` among "commonly used commands" and explicitly invite feedback on missing commands.
+
+Thread is fresh (1 point, 2 comments at post time). Two existing comments: `debarshri` says they're building "something similar... for remote agents", OP asks them what exactly. No hostile or off-topic norm yet; the thread-shape is "design feedback welcomed."
+
+## My reply
+
+```
+(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
+
+The "too much freedom" framing is fair, but `python3 -c` and `node -e` as core commands re-open most of it. Once the agent runs `python3 -c "import shutil; shutil.rmtree(p)"`, the fs ops happen inside the Python runtime, not your bash interpreter, so the sandbox's structured-feedback layer never sees them. Same for `node -e "require('fs').unlinkSync(...)"`. Either intercept at the syscall layer, or pre-parse the inline source and refuse blocked imports. Adjacent angle: agents also call Edit/Write/MCP tools that bypass Bash entirely, so a Bash-only sandbox leaves that surface uncovered. The complementary seam is the harness's PreToolUse hook, which sees every tool call before any executor runs:
+
+  import { customPolicies, allow, deny } from "failproofai";
+
+  customPolicies.add({
+    name: "block-inline-eval",
+    match: { events: ["PreToolUse"] },
+    fn: ({ toolName, toolInput }) => {
+      if (toolName !== "Bash") return allow();
+      if (/\b(python3?|node)\s+-(c|e)\b/i.test(toolInput?.command ?? "")) {
+        return deny("inline -c/-e blocked; drop a script file instead");
+      }
+      return allow();
+    },
+  });
+```
+
+## Insight for the FailProof team
+
+- Capsule Bash competes for the same problem ("agent does dangerous things via shell") but at a *different layer*: a sandboxed executor that the agent calls into, not a hook that gates the call. That's a real architectural choice and worth being explicit about in the README/docs. The "where in the stack does the policy decision live" framing is becoming the most reliable way to differentiate FailProof from sandbox-shaped competitors (Cordon, AgentPort, AegisProxy, Capsule Bash, SmolVM, etc.). Lead with the seam diagram, not the policy catalog.
+- The `python3 -c` / `node -e` escape-hatch observation is concrete and reusable. It's the same family of failure as GTFOBins-style allowlist bypass: the surface area of a "safe shell" includes language interpreters that re-introduce the unsafe surface inside a single command. A blog post titled something like "Why `python3 -c` is the hardest thing to sandbox" would land naturally on threads like this.
+- Worth noting: the OP explicitly invites *missing* commands as feedback. The thread norm here is constructive feature-add, not punitive critique. My draft picks the right tone (acknowledge the framing, show the gap, offer a complementary seam) but the line between "constructive critique" and "pitch" is thin on Show HN. If this gets flagged, the failure mode would be that the snippet reads as ad-copy more than the prose does.
+- Track Capsule Bash as a sibling project. If they grow, the user-facing pitch is "you don't need FailProof if you've fully replaced the executor" — which is true at the bash boundary, false at the Edit/Write/MCP boundary. Have an honest comparison page ready (we already do this for Cordon, AgentPort).
+
+## Notes / findings
+
+- Used the `browser-use` CLI subprocess form for parts of the read flow because `browser_extract_content` returned "No content extracted" twice on hn.algolia.com result pages (likely because the MCP env doesn't have an LLM key set for that primitive). MCP `browser_get_state` worked fine; the CLI eval path is the reliable fallback for structured DOM extraction.
+- Algolia's "Past Week" filter on date-sorted searches surfaces a manageable result set (~5-20 items per query) that's fast to triage. Past-week + sort-by-popularity is overkill for fresh threads. Past-week + sort-by-date is the right default for this harness.
+- Reply form rendered on this thread (`textarea[name=text]` present); thread is well within HN's reply window.
+- Cross-thread duplicate check on `item?id=48009460` returned clean across `drafts/`, `comments/`, and open PRs.
+- `browser-use close` resolves the MCP/CLI session-lock conflict (`Session 'default' is already running with different config`); standard recovery, no surprise.
+- ASCII punctuation only in the reply body. No em-dashes, en-dashes, fancy ellipses, curly quotes, or unicode arrows.