exospherehost · NiveditJain · May 4, 2026 · coderabbitai · May 4, 2026
diff --git a/drafts/2026-05-04T031239Z.md b/drafts/2026-05-04T031239Z.md
@@ -0,0 +1,44 @@
+# Reply to OP on "Show HN: BetterClaw - Compile a paragraph into a workflow that gates agent tools"
+
+- **HN:** https://news.ycombinator.com/item?id=47973502
+- **Story URL:** https://github.com/jfan22/BetterClaw
+- **OP:** infamous-oven (poster)
+- **Status:** draft (pending manual post)
+
+## Discovery path
+
+Browser-driven: opened `/ask` (no fit), `/show` (most adjacent items already covered by open PRs), then ran the Algolia search UI at `https://hn.algolia.com/?q=claude+code+hooks&type=story&dateRange=pastWeek` and skimmed the result list. BetterClaw surfaced as a Show HN explicitly soliciting design feedback in the same problem space (agent tool-call gating), with no prior PR coverage.
+
+## Story / OP
+
+Show HN of BetterClaw, a CLI that compiles a plain-English workflow paragraph into a directed graph where each node declares which tools are allowed at that step. A plugin hooks into the agent's tool-call path and blocks anything outside the graph before it reaches the MCP server. OP cites the PocketOS / Railway DB-deletion incident as the motivating example, and explicitly asks for feedback on:
+
+- Is "paragraph -> graph" the right authoring model, or should it be YAML / a DSL?
+- Where does this fall down? (Multi-step approvals? Loops? Sub-agents?)
+- What other agent runtimes should they support?
+
+2 points, 0 comments, 2 days old, reply form open. Adjacent product + OP solicits design discussion = passes the thread-fit gate per `INSTRUCTIONS.md`.
+
+## My reply
+
+```
+(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
+
+The paragraph -> graph compile reads cleanly for bounded workflows like the "diagnose, do not modify" example. The harder case is exploratory work where the path can't be pre-described: debugging unfamiliar code, refactoring across files, anything with branching uncertainty. The graph then has to be permissive enough to allow legitimate exploration (and stops gating much) or strict enough that every off-path tool call needs a human re-compile.
+
+A second axis of the same problem is invariant-shaped rather than workflow-shaped: "never DROP DATABASE on a prod connection string, regardless of which step the agent thinks it's on." Those resist the graph model because they're orthogonal to the task. They land more naturally as PreToolUse predicates that match on tool-input shape, not workflow position. The two layers feel complementary: paragraph-graph for "what should this run do," predicates for "what should the agent never do."
+```
+
+## Insight for the FailProof team
+
+- BetterClaw is the closest direct adjacency we've seen in the HN Show HN stream so far: same hook-into-tool-call-path mechanism, same MCP/Claude Code surface, same PocketOS/Railway-incident motivation. The two products are not competitive; they sit on different axes (workflow-bounded vs invariant-bounded). Worth tracking the OP (`infamous-oven`) and the repo (`jfan22/BetterClaw`) - if they ship sub-agent / loop support, the paragraph-graph model will need to either grow into composable sub-graphs or accept that invariant rules are the right layer for cross-cutting concerns. That's a natural collaboration / cross-link surface for a future blog post: "Workflow gates vs invariant gates: when each shape fits."
+- The "compile English to graph" authoring model is interesting product-positioning for FailProof: most of our policies today read like predicates (functions returning allow/deny/instruct), which is the right shape for invariants but a poor shape for "this run should follow steps A then B then C." If we ever add a workflow-recipe layer (say, a per-session "intended task graph"), the natural authoring affordance might be `failproofai run "in plain English what this session should do"` and have the model compile a transient graph for the duration of that session. The BetterClaw repo is a reasonable reference point for what that ergonomics could look like.
+- The OP's three solicited-feedback questions (paragraph vs DSL, multi-step/loops/sub-agents, runtimes-beyond-Claude-Code) map almost exactly to questions we should be answering in our own docs. Worth borrowing the structure: a "where this falls down" section in the README is good Show HN practice, and we don't currently have one.
+- Cadence note: 0-comment Show HNs are low-visibility for the broader HN audience but high-visibility for the OP (who pings their own thread on every reply). For competing-product-discussion replies, OP-visibility is what matters; broader-thread visibility is a bonus. This thread shape is a fine fit for substantive engagement; bad fit for any reply that depends on third-party upvotes to surface it.
+
+## Notes / findings
+
+- MCP `browser_navigate` failed on first call with "All connection attempts failed" - the documented launch-order trap from `INSTRUCTIONS.md`. Fell back to `browser-use` CLI subprocess as the playbook prescribes, completed the entire discovery + read flow that way without further issues.
+- Algolia HN search UI takes ~5-7 seconds to render results into `article.Story` elements; reading the DOM too soon returns `n: 0`. Worked around with `sleep 6` after `open` before the eval.
+- BetterClaw thread (id=47973502) reply form is present and unrestricted at 2 days old / 2 points. Fine for a draft target. The user's manual posting account will determine whether this comment lands and whether it gets any visibility.
+- Cross-thread duplicate guard: skimmed PR #36 (Git Shield), #38 (bwrap+sshfs), #33 (Lightport), #41 (Spec27) for paraphrase risk. Each makes a different specific design-axis argument (latency, mechanism, stack-position, when-verified). This draft adds a fifth axis (workflow-shape vs invariant-shape) - distinct argument, no body-text reuse.