diff --git a/drafts/2026-05-04T093024Z.md b/drafts/2026-05-04T093024Z.md new file mode 100644 index 0000000..6ac5bf3 --- /dev/null +++ b/drafts/2026-05-04T093024Z.md @@ -0,0 +1,56 @@ +# Reply draft: Open Bias Show HN, LLM-call seam vs tool-call seam + +- **HN:** https://news.ycombinator.com/item?id=47938497 +- **Parent comment:** https://news.ycombinator.com/item?id=47938512 (OP's own elaboration of the design rationale) +- **Story:** "Show HN: Open Bias - proxy that enforces agent behavior at runtime" (id=47938497, 21 points, 4 comments at draft time, 5 days old, repo at https://github.com/open-bias/open-bias/) +- **OP author:** `algomaniac` +- **Status:** draft (pending manual post) + +## Discovery + +Found via `https://hn.algolia.com/?q=Show+HN+agent&dateRange=pastWeek&type=story&page=0` while sweeping for adjacent agent-runtime-control launches not yet covered by an open PR on this repo. After verifying via `grep -rl "item?id=47938497" drafts/ comments/` (no hits) and `gh pr list --search "47938497"` (no hits), this thread is uncovered. The reply form on the story page is open. + +## OP's framing (algomaniac, id=47938512) + +> Hey HN, We spent the past year working on evals for teams running AI agents in production. We kept seeing rules that worked fine in evals stop working after a while (or miss inconsistently). And as teams added more rules, the agent started missing more of them overall. Evals and observability help, but the long tail finds you in prod anyway. Guardrails are like the side rails on the highway, useful, but you don't want to be hitting them often. We wanted the lane-keeping system that steers the agent as it deviates. So we built an open-source proxy that helps steer agents, catching and fixing violations before they reach users. Rules live in a RULES.md file (single source of truth for all policies). The thing we care most about is that the engines doing the checking are pluggable: ... + +So Open Bias is an LLM-call-layer proxy: it inspects the model's input/output, judges against RULES.md, and steers (corrects) the call before it propagates downstream. Aimed at chat-shaped or production-AI-agent deployments where output drift is the failure mode. + +## Why this thread passes the FailProof-mention gate + +This is a Show HN of a directly adjacent product (runtime agent-behavior enforcement) where the OP elaborates on a specific design choice (lane-keeping at the LLM seam, RULES.md as single source of truth, pluggable judge engines). They are explicitly soliciting design discussion in the comments. That matches CLAUDE.md gate row "Show HN of an adjacent product (sandbox, gateway, hook manager, policy engine) where the OP solicits design discussion: Yes, disclose affiliation, lead with substantive engagement on their design before mentioning FailProof." + +The reply leads with substantive engagement (LLM-call vs tool-call layer carry different failure modes) and ties one named FailProof policy to one concrete tool-call-shaped failure (DROP DATABASE in a migration). It does not include an install command, does not enumerate policies in a comma-list, does not talk about three-scope merge or `.failproofai/policies/`. The disclosure link is the only link. + +## My reply + +``` +(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai) + +Interesting choice to put the seam at the LLM call. We landed on the tool call, and the two layers catch different failure modes. Steering the model handles output drift; where it tends to miss is when the reasoning was fine but the tool input still does the wrong thing (a DROP DATABASE in a migration the LLM judged correct, an rm -rf with a path built from a stale env var, a force push that looked plausible to the proposer). At that point the LLM judge has already let it through; you want a hook running against the literal argv before exec. + +For coding agents we ship that as a small PreToolUse policy, e.g. warn-destructive-sql fires on a Bash whose argv contains DROP TABLE or DROP DATABASE. + +Curious whether RULES.md compiles to anything tool-call-shaped, or stays at the LLM seam? +``` + +Word count: ~143. Under the ~150-word cap. + +ASCII-only check: no em-dashes, en-dashes, fancy ellipses, curly quotes, or unicode arrows. Hyphens, semicolons, parens, periods, ASCII apostrophes only. + +## Insight for the FailProof team + +- **Composable, not competing.** Open Bias and FailProof sit at different layers (LLM-call vs tool-call). A reasonable customer architecture is both: Open Bias judges the model's reasoning before the agent dispatches, FailProof clamps tool args at the call site. A blog post titled something like "Why agent guardrails want both an LLM seam and a tool seam" would land naturally and could cite Open Bias as a complementary product. That stance also pre-empts the framing that FailProof is competing with Open Bias. +- **RULES.md as a draw.** The single-source-of-truth markdown file lands well with the HN audience (cf. Cohorte's YAML approach we covered in PR #49 - the same instinct, more declarative). FailProof's `.failproofai/policies/` directory of `*.{js,mjs,ts}` files is the code-as-policy answer. The doc story should make this trade-off explicit: when do you want a flat document, when do you want a typed function with `allow`/`deny`/`instruct` returns. +- **"Long tail finds you in prod anyway"** is the OP's strongest line and a quotable framing. FailProof's positioning around "deterministic, no model hop" cuts directly against the long-tail cost curve. Worth A/B-testing that phrasing in the README's intro section. +- **Thread audience.** algomaniac mentions cost breakdown is forthcoming; that suggests the OP is open to followups. If the substantive design comment lands well, a measured follow-up about how the two layers compose (with a docs link, not a pitch) could be appropriate after merge. Not part of this draft. +- **Marketing recurrence.** Searches like `Show HN agent` filtered to past week catch new adjacent-product launches reliably. We've covered ~40 in flight; the class is still emerging fast. Keep this rotation. + +## Notes / findings + +- **Browser-use MCP wedged on port 9333 (Reddit harness).** As documented in INSTRUCTIONS.md, `~/.config/browseruse/config.json` has `cdp_url: http://127.0.0.1:9333`, so the MCP tries to connect to the Reddit port and fails. The CLI subprocess form against `--cdp-url http://127.0.0.1:9334` worked end-to-end for navigate / eval / state. No need to touch the global config for this run. +- **Initial CLI session conflict.** First `$BU open` returned `Session 'default' is already running with different config. Run browser-use close first.` Resolved by `$BU close` and re-issuing. INSTRUCTIONS.md already documents this exact failure mode. +- **failproofai PreToolUse:Bash false-positive.** The base64 dance for HN URLs and the CDP URL kept working. No new triggers in this run. +- **Reply form on `/item?id=47938497`** is present (`hasReply: true` from the eval probe). The thread is 5 days old; HN's reply window stayed open. +- **Cross-thread duplicate guard.** The reply text is materially different from prior FailProof-mentioning drafts in this repo: the angle is the LLM-call vs tool-call layer distinction, framed as a design comparison with Open Bias's runtime proxy. The named policy (`warn-destructive-sql`) was used in `comments/2026-04-29T043958Z.md`, but for a different audience (a delegate-comment reply on the prod-DB thread, with a custom policy snippet); reusing the policy name here is fine because the comment is short, names exactly one policy, and ties it to a different framing (layer-distinction, not custom-snippet). +- **Algolia search rotation.** Used `Show HN agent` past-week as the discovery query. Prior session likely used `agent rm -rf` / `agent deleted files` / `cursor ruined`. No repeat of an exact query within this account's recent search history.