Stop wasting money on tokens your AI never needed to read.
A zero-dependency toolkit that cuts token consumption in AI coding sessions by 40–90% — without changing how you work. Drop it into any project, run one command, and every session becomes leaner.
When you ask an AI agent to do something in your codebase, it doesn't know what to read. So it reads everything it might need — entire files to find one function, raw API responses with 200 null fields, verbose answers that restate what you just said, the same file three times because it forgot it read it.
On a typical debugging session, 60–80% of the tokens consumed contribute nothing to the answer. You're paying for the AI's exploration, not its thinking.
| Component | What it saves | How |
|---|---|---|
CLAUDE.md |
Response bloat | Behavioral rules: no preambles, diffs not full files |
skills/ |
Skill mismatch | Lazy-load skill files — read descriptions first, full content only on match |
snapshot/generate.py |
Blind exploration | Pre-built project map: AI reads a 200-line index instead of 50 files |
session/compress.py |
Tool output noise | SmartCrusher for JSON/logs/diffs: drops nulls, truncates arrays |
session/learn.py |
Repeated mistakes | Mines sessions for waste patterns, writes corrections to CLAUDE.md |
config/cache-align.md |
Cache misses | Stable prompt prefix → provider KV cache hits → 40–90% input cost cut |
git clone https://github.com/subhamlistingsmanager/dont-burn.token
cd dont-burn.token
# Generate a snapshot of your project
python snapshot/generate.py /path/to/your/project
# Bootstrap an AI session
bash session/init.sh /path/to/your/projectPaste the output line into your AI session and you're running lean.
Because CLAUDE.md is in the repo root, Claude Code reads it automatically at session start. Clone this repo into your project or copy CLAUDE.md into any project root.
Copy the starter prompt from session/init.sh output and paste it as your first message:
Read CLAUDE.md, then snapshot/project.md, then skills/index.md, then session/corrections.md. Then wait for my task.
# Before passing a large API response to the AI
curl -s https://api.example.com/data | python session/compress.py --type json
# Before sending logs
cat server.log | python session/compress.py --type logs
# Before asking for a code review
git diff | python session/compress.py --type diffInstead of loading context for every possible task, skills are loaded only when needed.
skills/index.md ← AI reads this first (one-line descriptions)
skills/code.md ← loaded when: coding, debugging, refactoring
skills/web.md ← loaded when: browser automation, HTTP, scraping
skills/data.md ← loaded when: SQL, pandas, data analysis
skills/write.md ← loaded when: docs, commits, specs, emails
Reading the index costs ~100 tokens. Loading an unneeded full skill costs 500–2000 tokens. The index pays for itself on the first decision.
# Build a compressed project map
python snapshot/generate.py /path/to/project
# Output: snapshot/project.md (~200 lines)
# Contains: identity, dependency list, directory tree, entry points, module map
# Replaces: 50+ file reads worth of blind explorationThe snapshot is regenerated automatically by session/init.sh when source files change.
The config/cache-align.md guide explains how to structure your prompts so the provider KV cache hits on every turn. The short version:
- Keep the beginning of your system prompt identical across turns
- Append new context to the end, never insert in the middle
- Stable prefix ≥ 1024 tokens → cache kicks in → you pay ~10% of normal input cost on cached tokens
On a 10-turn session with a 5,000-token context, this saves roughly 45,000 cached tokens per session.
After each session, the AI may have wasted tokens in patterns that will repeat. session/learn.py finds them:
# Analyze a session log
python session/learn.py session.jsonl
# Distill accumulated corrections into CLAUDE.md
python session/learn.py --distillDetected patterns: preambles, trailing summaries, repeated file reads, oversized tool outputs, unnecessary CTAs.
- Fresh project — first session on an unfamiliar codebase
- Debugging a production issue
- Reviewing a pull request
- Data analysis and SQL
- Browser automation and scraping
- Navigating a large codebase
See docs/research.md for the full breakdown of where tokens go, how each component saves them, and the expected savings by session type.
CLAUDE.md ← behavior rules (auto-read by Claude Code)
AGENTS.md ← multi-agent version
skills/
index.md ← lazy skill registry
code.md / web.md / data.md / write.md
snapshot/
generate.py ← project map generator
session/
init.sh ← session bootstrapper
compress.py ← SmartCrusher for JSON, logs, diffs, code
learn.py ← session waste analyzer
corrections.md ← accumulated behavioral corrections
config/
output-shaper.md ← verbosity + effort rules
cache-align.md ← KV-cache alignment guide
docs/
research.md ← evidence and methodology
walkthrough.md ← full user walkthrough
situations/ ← per-scenario guides
Works with any AI that reads markdown context: Claude Code, Cursor, Codex, Gemini CLI, Aider, Amp, and any model accessed via API.
The Python scripts require Python 3.10+. No external dependencies.