-
Notifications
You must be signed in to change notification settings - Fork 0
[claude-hackernews] Reply draft: Agent-skills-eval thread, deny() vs instruct() for routing Bash DB queries via tidewave (id=48046023) #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NiveditJain
wants to merge
1
commit into
main
Choose a base branch
from
hn-deny-vs-instruct-tidewave-48046023
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| **HN:** https://news.ycombinator.com/item?id=48046023 (parent: https://news.ycombinator.com/item?id=48051949) | ||
|
|
||
| **Story / OP:** Show HN: Agent-skills-eval - Test whether Agent Skills improve outputs (link-only submission to https://github.com/darkrishabh/agent-skills-eval; no toptext on the HN post). 1 day old, 72 points, 36 comments. | ||
|
|
||
| **Status:** draft (pending manual post) | ||
|
|
||
| **The post:** | ||
|
|
||
| OP is a link-only Show HN of an eval harness that A/B tests whether Claude Code Skills improve outputs over the baseline. No body text on the HN submission itself. The substantive sub-thread is reedlaw's chain about hooks not being able to enforce routing (parent comment below). | ||
|
|
||
| **Parent comment (reedlaw, id=48051949), verbatim:** | ||
|
|
||
| > I tried to create a hook that would detect when token usage was running out and write HANDOFF.md so I could switch to another agent and finish the current task. It never worked reliably. To make a hook for db queries, it would need to run before each bash call, check if it looks like a query, and then exit with a new prompt, e.g.: "Use tidewave's execute_sql_query for DB access". But then it could just ignore the prompt the same as CLAUDE.me. What if I really wanted to use bash for a specific task? The real issue is that prompts are not tightly coupled with capabilities. If we admit that, then skills are over hyped. | ||
|
|
||
| (The grandparent, reedlaw's id=48050489, gives the concrete failure: Opus 4.7 ignores a 720-byte CLAUDE.md telling it to route DB queries through tidewave's MCP, and instead does `Bash(DATABASE_URL=$(grep ... .env) echo "ok")`.) | ||
|
|
||
| **My reply:** | ||
|
|
||
| ``` | ||
| (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai) | ||
|
|
||
| The "ignore the prompt the same as CLAUDE.md" part is where deny() and instruct() diverge. instruct(msg) injects guidance and lets the tool call proceed; deny(msg) returns a tool-error to the agent, so the bash literally does not run and the model has to pick another path. For "agent routes around tidewave" the shape you want is deny, not instruct. Roughly: | ||
|
|
||
| customPolicies.add({ | ||
| name: "route-db-via-tidewave", | ||
| match: { events: ["PreToolUse"] }, | ||
| fn: async (ctx) => { | ||
| if (ctx.toolName !== "Bash") return allow(); | ||
| const cmd = ctx.toolInput?.command ?? ""; | ||
| if (/\b(psql|DATABASE_URL|SELECT |INSERT |UPDATE |DELETE )/i.test(cmd)) | ||
| return deny("Use tidewave's execute_sql_query MCP tool for DB access."); | ||
| return allow(); | ||
| }, | ||
| }); | ||
|
|
||
| "What if I really do want bash" is just another branch in the regex. Capability coupling lives in the hook return value rather than in the prompt text. | ||
| ``` | ||
|
|
||
| **Insight for the FailProof team:** | ||
|
|
||
| The reedlaw thread is the cleanest articulation I've seen on HN of the *routing* (vs. blocking) use case for PreToolUse - "ignored CLAUDE.md => need a hook that intercepts the bypass path and force-redirects to the MCP tool". That's a different conversation from the destructive-ops gating story we usually pitch. Worth a short blog post: "deny vs instruct: when to fail-the-tool-call vs when to inject-and-continue" with this exact tidewave routing example. Also: rirze (id=48051665) pushes back that hooks are hard "since the default approach it's using is call the URL directly" - which conflates hook-availability with hook-coverage. The point is the hook *can* match on URL-shaped Bash invocations too; pattern-match the agent's actual call, not the platonic call. A second blog beat. Both pieces would slot naturally into the next time someone on HN asks "how do I keep agent X from doing Y instead of Z." | ||
|
|
||
| **Notes / findings:** | ||
|
|
||
| - Thread is 1 day old, last comment 22 minutes before drafting - reply window is wide open. | ||
| - The grandparent (reedlaw id=48050489) is a concrete-failure-mode comment, exactly the shape the thread-fit gate accepts. No pitch-vibe risk if the reply stays on the deny-vs-instruct distinction. | ||
| - Show HN around it is about Skills evaluation, not policy enforcement, so I am replying mid-thread to a sub-conversation about hooks rather than at the top level. This stays on-topic for the sub-thread (hooks for routing) without hijacking the OP's product (skills eval). | ||
| - ASCII-only check: no em-dashes, en-dashes, fancy ellipses, curly quotes, or unicode arrows in the reply body. | ||
| - Reply form on /reply?id=48051949 returns the login wall for the unauthenticated profile, as expected. Posting happens on the user's side. | ||
| - Cross-thread duplicate guard: deny()-vs-instruct() framing has not appeared in earlier drafts. Earlier PRs covered transport-vs-hook (Lilith), MCP-surface-vs-PreToolUse (Faz), Docker-vs-intent (Armorer), workflow-shape-vs-invariant (BetterClaw), etc. - none of them led with the deny-as-tool-error semantics. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language tag to the fenced code block.
Line 19 opens a fenced block without a language, which triggers markdownlint MD040.
Suggested fix
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 19-19: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents