-
Notifications
You must be signed in to change notification settings - Fork 0
[claude-hackernews] Reply draft: $38k Bedrock runaway, LLM-call vs tool-call layer (id=47933355) #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NiveditJain
wants to merge
1
commit into
main
Choose a base branch
from
hn-bedrock-runaway-llm-vs-hook-layer-47933355
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[claude-hackernews] Reply draft: $38k Bedrock runaway, LLM-call vs tool-call layer (id=47933355) #43
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| # HN reply draft: $38k AWS Bedrock runaway, hook-layer vs LLM-call-layer cap | ||
|
|
||
| **Status:** draft (pending manual post) | ||
|
|
||
| **HN:** https://news.ycombinator.com/item?id=47933355 | ||
| (top-level reply to OP, not a comment reply) | ||
|
|
||
| **Story:** Self-post titled "$38k AWS Bedrock bill caused by a simple prompt caching miss" by user `Zephyr0x`, 5 days old, 8 points, 0 comments. | ||
|
|
||
| **OP:** Coding-agent workflow (Droid -> LiteLLM -> Bedrock -> Claude Opus 4.6) ran for an extended period with prompt caching mostly missing; final bill was ~$37.9k, of which ~$35.6k was uncached input tokens (~6.47B tokens) compounding across many turns. OP frames this as a runaway-cost failure mode of the platform, lists what they wish existed (IAM principal monthly cap, per-model call cap, per-workflow uncached-token cap, hard stop at budget cross), and asks: "Has anyone here built reliable guardrails for this? IAM deny rules? API gateways? token-budget proxies? per-workflow kill switches?" | ||
|
|
||
| ## The post (summary) | ||
|
|
||
| OP put a metered Opus model behind a daily local coding-agent workflow without verifying that prompt caching was actually working through every layer of the chain. Caching support is advertised at every hop (Anthropic, Bedrock, LiteLLM, Droid) but nothing in the chain told them the actual cache hit rate was ~25% of input tokens. By the time their AWS bill came in the agent had already shipped ~6.5B uncached input tokens. Their argument: budget alarms / credits / "supported" checkboxes are soft signals dressed up as safety boundaries; for unattended agents at metered LLM prices, you need a hard stop. They explicitly call out four mechanisms they would like (IAM principal $/month cap, per-model call/day cap, per-workflow uncached-tokens/hour cap, hard stop on budget crossing) and ask the thread for actual implementations. | ||
|
|
||
| ## My reply | ||
|
|
||
| ``` | ||
| (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai) | ||
|
|
||
| The expensive line item was uncached input tokens compounding inside the LLM call, so the cap has to live at the layer that sees those tokens: LiteLLM with `max_input_tokens` per route, or an IAM Bedrock rate cap. Budget alarms run after the fact. Claude Code's hook layer (where FailProof sits) only sees the tool-call seam: Bash, Read, Write, MCP, etc. It cannot reason about token spend on a single Bedrock call. Where the hook layer does help is the per-workflow kill switch you listed: a custom PreToolUse policy that counts tool invocations against a per-session ceiling and denies past it. That bounds how many turns a runaway can attempt; it does not bound a single megaturn that ships 5GB of context. | ||
| ``` | ||
|
|
||
| ## Insight for the FailProof team | ||
|
|
||
| The OP frames their problem as "guardrails missing from the platform," but the four mechanisms they list span two distinct layers: (a) **LLM-call layer** (IAM principal $/month, per-model call/day, per-workflow uncached-tokens/hour) and (b) **agent-workflow layer** (per-workflow kill switch). FailProof addresses (b) cleanly via PreToolUse counting and Stop policies; it has *nothing* to say about (a). When prospects ask "why isn't FailProof a token-budget guardrail?" we should answer this exact way every time: "FailProof sits on the agent's tool-call seam, not on the LLM-call seam. For LLM-call costs, use a LiteLLM-style proxy or an IAM rate cap." Worth a short blog post: a 2x2 of (cost runaway vs destructive op) x (LLM layer vs tool layer) to position FailProof's actual surface vs the proxy/IAM surface so we never get pulled into pretending we solve token-budget problems. Also: most "agent went rogue" stories on HN are tool-layer stories (rm -rf, drop database, force push); this is the rare one where the loss happens entirely upstream of the tool surface, and that distinction is genuinely useful framing in posts and demos. | ||
|
|
||
| ## Notes / findings | ||
|
|
||
| - The thread is 5 days old, 8 points, no comments yet. Reply form is open. Visibility is low; OP gets notified, broader audience minimal. This is fine - the comment is shaped as targeted help to the OP, not as broadcast pitch. | ||
| - OP's stack (Droid -> LiteLLM -> Bedrock -> Opus 4.6) is the clue that prompt caching went bad: Bedrock's prompt caching has TTL and cache-point semantics that LiteLLM's pass-through can quietly violate when prompts shift slightly across turns; LiteLLM's `cache_control` on the wrong block is the usual culprit. Worth not litigating in the reply but noting here for follow-up engagement if OP responds. | ||
| - Strict gate check: this thread is at the LLM-call layer rather than the tool-call layer FailProof addresses. The reply is shaped to acknowledge that mismatch upfront ("the layer that sees those tokens") and only mentions FailProof for the narrow per-workflow kill switch slice OP explicitly listed. No install commands, no policy-name comma list, no feature/version talk, no dashboard plug, single repo link in disclosure. Body is ~138 words. | ||
| - ASCII punctuation throughout: hyphens in compound nouns (per-workflow, per-session, tool-call), colons for the "here is the layer" structure, semicolon for the "bounds X; does not bound Y" contrast. No em-dashes, en-dashes, fancy ellipses, curly quotes, or unicode arrows. | ||
| - Cross-thread duplicate guard: searched local `drafts/` and `comments/` for `item?id=47933355` (none) and scanned all open PR bodies via `gh pr view --json body` for the same string (none). Body content does not paraphrase any prior FailProof reply in this repo - the LLM-call vs tool-call layer framing is unique to this thread. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language tag to the fenced block to satisfy markdownlint.
Line 18 opens a fenced block without a language, which triggers MD040. Add
text(ormd) to keep lint clean.Suggested fix
Verify each finding against the current code and only fix it if needed.
In
@drafts/2026-05-04T003422Z.mdaround lines 18 - 22, The fenced code blockthat starts with
on the quoted discussion block is missing a language tag and triggers markdownlint MD040; update that opening fence to include a language hint (e.g., changetotext ormd) so the block is explicitly taggedand the linter no longer reports MD040.