Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions drafts/2026-05-04T031239Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Reply to OP on "Show HN: BetterClaw - Compile a paragraph into a workflow that gates agent tools"

- **HN:** https://news.ycombinator.com/item?id=47973502
- **Story URL:** https://github.com/jfan22/BetterClaw
- **OP:** infamous-oven (poster)
- **Status:** draft (pending manual post)

## Discovery path

Browser-driven: opened `/ask` (no fit), `/show` (most adjacent items already covered by open PRs), then ran the Algolia search UI at `https://hn.algolia.com/?q=claude+code+hooks&type=story&dateRange=pastWeek` and skimmed the result list. BetterClaw surfaced as a Show HN explicitly soliciting design feedback in the same problem space (agent tool-call gating), with no prior PR coverage.

## Story / OP

Show HN of BetterClaw, a CLI that compiles a plain-English workflow paragraph into a directed graph where each node declares which tools are allowed at that step. A plugin hooks into the agent's tool-call path and blocks anything outside the graph before it reaches the MCP server. OP cites the PocketOS / Railway DB-deletion incident as the motivating example, and explicitly asks for feedback on:

- Is "paragraph -> graph" the right authoring model, or should it be YAML / a DSL?
- Where does this fall down? (Multi-step approvals? Loops? Sub-agents?)
- What other agent runtimes should they support?

2 points, 0 comments, 2 days old, reply form open. Adjacent product + OP solicits design discussion = passes the thread-fit gate per `INSTRUCTIONS.md`.

## My reply

```
(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)

The paragraph -> graph compile reads cleanly for bounded workflows like the "diagnose, do not modify" example. The harder case is exploratory work where the path can't be pre-described: debugging unfamiliar code, refactoring across files, anything with branching uncertainty. The graph then has to be permissive enough to allow legitimate exploration (and stops gating much) or strict enough that every off-path tool call needs a human re-compile.

A second axis of the same problem is invariant-shaped rather than workflow-shaped: "never DROP DATABASE on a prod connection string, regardless of which step the agent thinks it's on." Those resist the graph model because they're orthogonal to the task. They land more naturally as PreToolUse predicates that match on tool-input shape, not workflow position. The two layers feel complementary: paragraph-graph for "what should this run do," predicates for "what should the agent never do."
```
Comment on lines +24 to +30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced reply block (Line 24).

This triggers markdownlint MD040 and is a quick fix.

Suggested patch
-```
+```text
 (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
 
 The paragraph -> graph compile reads cleanly for bounded workflows like the "diagnose, do not modify" example. The harder case is exploratory work where the path can't be pre-described: debugging unfamiliar code, refactoring across files, anything with branching uncertainty. The graph then has to be permissive enough to allow legitimate exploration (and stops gating much) or strict enough that every off-path tool call needs a human re-compile.
 
 A second axis of the same problem is invariant-shaped rather than workflow-shaped: "never DROP DATABASE on a prod connection string, regardless of which step the agent thinks it's on." Those resist the graph model because they're orthogonal to the task. They land more naturally as PreToolUse predicates that match on tool-input shape, not workflow position. The two layers feel complementary: paragraph-graph for "what should this run do," predicates for "what should the agent never do."
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @drafts/2026-05-04T031239Z.md around lines 24 - 30, Add a language tag to the
fenced code block containing the quoted paragraph to satisfy markdownlint MD040:
replace the opening triple backticks with a language-specified fence (e.g.,

so the parser recognizes it as a code/quote block and the linter stops flagging
MD040.


## Insight for the FailProof team

- BetterClaw is the closest direct adjacency we've seen in the HN Show HN stream so far: same hook-into-tool-call-path mechanism, same MCP/Claude Code surface, same PocketOS/Railway-incident motivation. The two products are not competitive; they sit on different axes (workflow-bounded vs invariant-bounded). Worth tracking the OP (`infamous-oven`) and the repo (`jfan22/BetterClaw`) - if they ship sub-agent / loop support, the paragraph-graph model will need to either grow into composable sub-graphs or accept that invariant rules are the right layer for cross-cutting concerns. That's a natural collaboration / cross-link surface for a future blog post: "Workflow gates vs invariant gates: when each shape fits."
- The "compile English to graph" authoring model is interesting product-positioning for FailProof: most of our policies today read like predicates (functions returning allow/deny/instruct), which is the right shape for invariants but a poor shape for "this run should follow steps A then B then C." If we ever add a workflow-recipe layer (say, a per-session "intended task graph"), the natural authoring affordance might be `failproofai run "in plain English what this session should do"` and have the model compile a transient graph for the duration of that session. The BetterClaw repo is a reasonable reference point for what that ergonomics could look like.
- The OP's three solicited-feedback questions (paragraph vs DSL, multi-step/loops/sub-agents, runtimes-beyond-Claude-Code) map almost exactly to questions we should be answering in our own docs. Worth borrowing the structure: a "where this falls down" section in the README is good Show HN practice, and we don't currently have one.
- Cadence note: 0-comment Show HNs are low-visibility for the broader HN audience but high-visibility for the OP (who pings their own thread on every reply). For competing-product-discussion replies, OP-visibility is what matters; broader-thread visibility is a bonus. This thread shape is a fine fit for substantive engagement; bad fit for any reply that depends on third-party upvotes to surface it.

## Notes / findings

- MCP `browser_navigate` failed on first call with "All connection attempts failed" - the documented launch-order trap from `INSTRUCTIONS.md`. Fell back to `browser-use` CLI subprocess as the playbook prescribes, completed the entire discovery + read flow that way without further issues.
- Algolia HN search UI takes ~5-7 seconds to render results into `article.Story` elements; reading the DOM too soon returns `n: 0`. Worked around with `sleep 6` after `open` before the eval.
- BetterClaw thread (id=47973502) reply form is present and unrestricted at 2 days old / 2 points. Fine for a draft target. The user's manual posting account will determine whether this comment lands and whether it gets any visibility.
- Cross-thread duplicate guard: skimmed PR #36 (Git Shield), #38 (bwrap+sshfs), #33 (Lightport), #41 (Spec27) for paraphrase risk. Each makes a different specific design-axis argument (latency, mechanism, stack-position, when-verified). This draft adds a fifth axis (workflow-shape vs invariant-shape) - distinct argument, no body-text reuse.