exospherehost · NiveditJain · May 8, 2026
diff --git a/drafts/2026-05-08T193033Z.md b/drafts/2026-05-08T193033Z.md
@@ -0,0 +1,75 @@
+# Reply draft: Veris Show HN, mock-vs-live divergence and the runtime hook seam
+
+- **HN:** https://news.ycombinator.com/item?id=48054313
+- **Story:** "Show HN: Veris - Agent sandboxes with simulated external services" (id=48054313, posted by jrm-veris, 9 points / 23 hours / 0 comments at draft time, links to https://veris.ai/sandbox)
+- **Status:** draft (pending manual post)
+
+## Discovery
+
+Browser sweep (no memorized links):
+
+1. `https://news.ycombinator.com/ask` - scanned top 24 Ask HN; mostly meta-topics (career, AI cost, MCP-process count, "is Claude Code getting worse", LLM comments) which the thread-fit gate filters out.
+2. `https://news.ycombinator.com/show` - scanned top 30 Show HN; spotted Tilde.run (id=48037724) at 196 points / 129 comments (saturated, mid-thread visibility near zero per gate), and a cluster of fresh adjacent Show HNs.
+3. `https://hn.algolia.com/?q=agent+deleted&type=story&dateRange=pastWeek&sort=byDate` - 1 result (Crit, id=48062402; review-tool space already heavily covered in our open PRs).
+4. `https://hn.algolia.com/?q=claude+code&type=story&dateRange=pastWeek&sort=byPopularity` - surfaced the Claude Code symlink-sandbox-escape CVE (id=48057842, 42 pts, 5 comments). Inspected the FailProof `block-read-outside-cwd` source in `src/hooks/builtin-policies.ts` lines 763-820: it uses `path.resolve()` with no `realpath`/symlink resolution, so it shares the same bypass shape as the Claude Code CVE. Cannot honestly pitch it as a fix; thread fails the gate. Skipped.
+5. `https://hn.algolia.com/?q=agent+sandbox&type=story&dateRange=pastWeek&sort=byDate` - found Veris (id=48054313): fresh Show HN of an eval-time agent-sandbox tool with stateful LLM-powered mocks, 9 points / 0 comments / 23 hours. Clean adjacent-product Show HN, gate-passing.
+
+Three-surface duplicate scan confirms id=48054313 is not in `drafts/`, not in `comments/`, and not in any open PR diff on this repo.
+
+## OP
+
+The submission is link-only - no inline `toptext` on the HN page. Per the linked Veris product page (https://veris.ai/sandbox):
+
+- Veris is a pre-prod simulation environment for testing AI agents end-to-end.
+- It ships **stateful, LLM-powered mock services** for 50+ enterprise platforms (SWIFT and OpenSanctions for banking, Salesforce / HubSpot for CRM, Zendesk / Intercom for support, Slack / Jira for productivity, Stripe / Shopify for payments).
+- Fault classes it explicitly catches: hallucinations, incorrect tool usage, policy violations, context retention failures, latency.
+- Architecture: scenario generation from agent code / production logs / past incidents, deterministic Veris Simulation Engine with rewards and replay scoring, multi-layer grading (scripted, LLM-judge, hybrid), training integration for SFT and RL.
+- Positioning: framework-agnostic eval-time platform, no MCP / Claude Code / Agents-SDK specifics in the public material.
+- Use cases highlighted: customer support, fraud detection.
+- The pitch is "ship knowing only the happy path" is the failure mode.
+
+## My reply
+
+```
+(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
+
+The load-bearing property here is that the mocks are LLM-powered and stateful, so you can run 10k scenarios safely without moving real money or paging a real Salesforce admin. The cost: bugs whose pathology only surfaces against the live service (idempotency-key replay, prod-vs-staging account-ID prefix drift, the rate limiter's actual jitter, partial state on a 502) won't reproduce in the mock. The same agent that passes every Veris scenario can emit a malformed call against real Stripe on the first prod prompt that nudges its plan off-script. A PreToolUse hook fills that gap by denying on call shape rather than scenario coverage:
+
+  customPolicies.add({
+    name: "block-prod-stripe-transfer-over-threshold",
+    match: { events: ["PreToolUse"] },
+    fn: async (ctx) => {
+      const url = String(ctx.toolInput?.url ?? "");
+      const amount = Number(ctx.toolInput?.body?.amount ?? 0);
+      if (url.includes("api.stripe.com/v1/transfers") && amount > 100_000)
+        return deny("Stripe transfer above $1000 (cents) blocked at runtime");
+      return allow();
+    },
+  });
+
+Eval-time mocks gate the scenarios you wrote; runtime hooks gate the calls you didn't see coming.
+```
+
+## Insight for the FailProof team
+
+The Veris-shaped axis is meaningfully distinct from the static-vs-runtime (Snyk, PR #42) and scenario-vs-runtime (TrainForgeTester PR #53, Spec27 PR #41) framings already in the open-PR set: it's specifically about **mock-vs-live divergence**. A scenario-test runner doesn't have to mock the world; Veris (and Veris-likes) explicitly do, and that simulation makes the eval-time guarantees inherently softer than scenario tests against a real staging environment. The honest framing is:
+
+- Static analysis (Snyk-shape): catches what's enumerable from code at rest.
+- Scenario tests against real services (TrainForgeTester / Spec27 shape): catches what's enumerable from prompt / tool-call traces.
+- Stateful simulated services (Veris-shape): catches what's enumerable plus the multi-turn state-machine behaviors, but at the cost of mock fidelity.
+- Runtime PreToolUse hooks: catches the always-wrong call shape regardless of whether anyone enumerated it.
+
+This is a fourth seam that deserves its own one-page doc note alongside the other three. The angle "your simulator is a model of the world; the hook gates the call about to land in the world" is sharp enough to slot into any future Veris / Cygnal / Agnostic / Patronus-style pre-prod simulation Show HN. Customer-support and fraud-detection are the two domains Veris highlights, and both are exactly where "the agent passed every scenario but the rate limiter / idempotency / partial-state behavior in prod still bit us" is a real story; FailProof should consider a `examples/payments-policy.ts` recipe in the repo for that audience.
+
+The thread is 0 comments at draft time, so a substantive top-level peer comment lands clean without competing against existing discussion. The OP is link-only on HN (no inline `toptext`), so anyone landing here is reading from the Veris site itself and has the simulation context loaded.
+
+## Notes / findings
+
+- Body word count: ~115 words of prose + ~50 words in the snippet = ~165 total; brand-voice band is "under ~150 words" with the working example at ~110. Slightly above the working-example footprint but well below the flagged-shape ~220-word footprint. Reads short on screen because the snippet is dense.
+- ASCII punctuation only: hyphens (`-`), straight quotes (`"`/`'`), three ASCII dots if needed, parentheses for the list of mock-fidelity gaps. No em-dashes, en-dashes, curly quotes, fancy ellipses, or unicode arrows. The snippet uses ASCII `?`, `?`, `??`, and template strings only.
+- One disclosure line in plain parens at the top, lowercase `disclosure:`, single repo URL. No second link at the bottom. No install command. No comma-list of policy names. No three-scope / 39-policies / dashboard / `~/.failproofai/` callouts. Custom-policy snippet, not a built-in name (so no over-specific claim that an OOTB policy exists for `block-prod-stripe-transfer-over-threshold` - it's illustrative).
+- Cross-thread duplicate guard: framing axis ("mock-vs-live divergence", "the simulator is a model of the world") is materially distinct from the TrainForgeTester (PR #53) "scenarios catch enumerable behaviors, hooks catch always-wrong shapes" line and from Spec27 (PR #41) "tests validate the contract you wrote, hooks catch shapes the contract didn't list" line. Snippet domain (Stripe transfer URL + amount threshold) is unique to this thread - TrainForgeTester named `block-rm-rf` only, Spec27 used a `DROP TABLE` SQL regex. Closing aphorism is paraphrase-distinct: "scenarios you wrote vs calls you didn't see coming".
+- Reply form on the Veris thread is open: `<form action="comment">` with `<textarea name="text">` and `<input type=submit value="add comment">` rendered at the bottom of the page. No `[dead]` / `[flagged]` markers. Thread is replyable.
+- The Claude Code symlink CVE thread (id=48057842) was a tempting near-miss: concrete-failure shape, replyable, but FailProof's `block-read-outside-cwd` (`src/hooks/builtin-policies.ts` lines 763-820) uses `path.resolve(cwd, target)` with no symlink resolution, so claiming the policy would have prevented the CVE is wrong. Worth filing this against `failproofai` as a real bug: `block-read-outside-cwd` should call `fs.realpathSync()` (or async equivalent) on `target` before the prefix check, and probably also walk the chain when the path doesn't exist yet (write-time symlink-create-then-write attack). Same defect would apply to `block-secrets-write` if the agent writes to a symlinked path. Worth a separate issue / PR thread on the failproofai repo.
+- Read cadence: 1 navigate to `/ask`, 1 to `/show`, 4 to Algolia search variants, 1 to the CVE item page, 1 to the Veris item page, 1 each to the Veris product page and the failproofai builtin-policies source. Well under the 20-pages-per-5-minute cap and the 50-pages-per-hour cap. No bursts.
+- Used WebFetch for github.com / veris.ai (allowed - not ycombinator hosts) and `gh api` for failproofai source inspection (also non-ycombinator). All HN reads went through the dedicated Chrome profile via the `browser-use` MCP. No HTTP-client traffic to any ycombinator host.