billion-token-one-task · CMLKevin · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,9 +1,3 @@
-.DS_Store
-node_modules/
-.vscode/
-.idea/
-*.log
-.env
-.env.local
-dist/
-build/
+.vercel/
+.gitignore.local
+/tmp/
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
 # Token-Ignition
 
-> The task is the interview. Pass it, and you join the research.
+> Model horizons double every four months. The scaffolds around them do not. That gap is the work.
 
 Token-Ignition is a **selection gate for AI-native researchers**.
 
-We don't hire on résumés, pitches, or intro calls. We invert the interview: you define a task, you build a system, and if that system can evolve itself and clear an AI-audited gate, you're invited into the research group.
+We don't hire on résumés, pitches, or intro calls. We invert the interview: you define a long-horizon task, you build a scaffold that evolves itself, and if that scaffold clears an AI-audited gate, you're invited into the research group.
 
 Tokens are how we make that possible — not why we do it.
 
@@ -15,42 +15,78 @@ Tokens are how we make that possible — not why we do it.
 This repository hosts the **v0.1 protocol test** of Token-Ignition:
 
 - static frontend (`index.html` + `assets/`) — live at the Token-Ignition submission site
-- meta-rules, protocol spec, and submission schema (see `/spec`)
+- meta-rules, protocol spec, and submission schema (in-page + this README)
 - public ledger of submission hashes — AI audit results appended by bot (to be wired)
 
 No backend is wired yet. Submissions in v0.1 are captured client-side and hashed into a local ledger for visual/UX testing; the real submission pipeline (GitHub issue / serverless endpoint → AI audit workflow → ledger append) will land in v0.2.
 
 ---
 
+## The thesis
+
+Model capability doubles roughly every four months — METR's measured cadence has only accelerated since their original paper. The engineering around it — memory, tools, planning loops, self-correction, long-horizon coherence — moves at a human pace, and caps out several orders of magnitude earlier than the models themselves could support.
+
+This lab exists to close that gap. We are climbing a ladder of **token horizons per single coherent task**, from 10⁶ to 10¹². Each rung unlocked becomes a product.
+
+```
+  10¹²   ────────────    ?  the research frontier
+  10¹¹   ───────────     autonomous long-horizon research
+  10¹⁰   ──────────      multi-month project execution
+  10⁹    ─────────       self-evolving codebases
+  10⁸    ────────        gate.3  —  ignition
+  10⁷    ───────         gate.2  —  verified
+  10⁶    ──────          gate.1  —  admission
+```
+
+The first three rungs are the admission ramp. What happens above them is the lab.
+
+---
+
+## Three axes of self-evolution
+
+A self-evolving scaffold must move along at least one of these axes between runs — without a human editing prompts, weights, or code in between. The third is rare. We weight it highest.
+
+| axis | what it means |
+|------|---------------|
+| **01 // behavior** | The scaffold changes *how* it acts across runs — its policies, decision rules, strategy. |
+| **02 // knowledge** | The scaffold accumulates, distills, or restructures *what it knows* across runs. |
+| **03 // scaffold** | The scaffold modifies *itself* across runs — its tools, control loop, evaluation criteria, its successor. |
+
+A system that only rewrites its prompt is not, by itself, a self-evolving scaffold.
+
+---
+
 ## The three gates
 
 | gate | budget | unlock condition |
 | --- | --- | --- |
 | `gate.1` — admission | 1M tokens | any well-formed submission is admitted |
-| `gate.2` — verified | 10M tokens | AI auditor confirms reproducible self-evolution on gate.1 artifact |
-| `gate.3` — research | 100M tokens | emergent behavior verified by consensus of ≥3 independent models; standing invitation to join the research group |
+| `gate.2` — verified | 10M tokens | AI auditor confirms reproducible self-evolution on the gate.1 artifact, with a real delta against ablation |
+| `gate.3` — research | 100M tokens | scaffold-level evolution — not merely behavioral drift — verified by consensus of ≥3 independent models; standing invitation to join the research group |
 
 Clearing `gate.3` is how you get in. Tokens are the side-effect that lets you keep going.
 
 ---
 
 ## Meta-rules
 
-- **R1** — you define the task; it must require a system that evolves itself.
+- **R1** — you define the task; it must be long-horizon, and must require a scaffold that evolves itself to push the achievable horizon further.
 - **R2** — you define the evaluation criterion; it must be reproducible and machine-verifiable.
-- **R3** — you build the system; the system, not you, produces the final output.
+- **R3** — you build the scaffold; the scaffold, not you, produces the final output.
 - **R4** — all submissions are AI-judged; human audit is random and post-hoc.
 - **R5** — identity is irrelevant; submissions are accepted under pseudonym.
 - **R6** — selection is gated; pass a gate, unlock more resources; pass the final gate, join the research.
+- **R7** — ablation is required; you must submit a baseline run on the same model with the minimal scaffold. The delta is the evidence. A scaffold that cannot beat its own ablation is not evolving — it is merely present.
 
 ---
 
 ## What counts as a valid submission
 
-1. **Self-evolution, not prompt-engineering.** The system must modify its own behavior across iterations without human edits to prompts, weights, or code between runs.
+1. **Self-evolution, not prompt-engineering.** The scaffold must modify its own behavior, knowledge, or structure across iterations — without human edits between runs.
 2. **Machine-verifiable output.** The evaluation criterion must be checkable by an AI auditor with no proprietary access — public endpoint, public artifact.
 3. **Live, AI-readable endpoint.** You must provide a URL an AI can crawl. HTML is fine; JSON / OpenAPI / plain text are better. Logins, captchas, GUIs are not accepted.
-4. **Reproducibility micro-run.** Attach at least one log of a full run: inputs, intermediate state, final artifact hash. Our auditor re-runs a randomly sampled slice.
+4. **Ablation baseline.** A second endpoint or log: same model, minimal scaffold, same task. Your scaffold's contribution is the delta between this and your main endpoint.
+5. **Reproducibility micro-run.** Attach at least one log of a full run: inputs, intermediate state, final artifact hash. Our auditor re-runs a randomly sampled slice.
 
 If the AI auditor cannot independently verify your artifact, the submission is rejected. We do not email you for clarifications. **The endpoint is the application.**
 
@@ -70,10 +106,10 @@ The frontend is vanilla HTML/CSS/JS — no build step. Copy / translate strings
 
 ## Roadmap
 
-- [x] v0.1 — static protocol test site, hybrid terminal + form UX, bilingual EN/ZH
-- [ ] v0.2 — GitHub-repo-as-backend: submissions become issues/PRs, AI audit runs as GitHub Action, ledger appends as commit
-- [ ] v0.3 — multi-model consensus judge for `gate.3` (3+ independent models), random human audit sampler
-- [ ] v0.4 — public API spec for submission endpoint requirements (`/benchmark`, artifact hash conventions)
+- [x] v0.1 — static protocol test site: thesis-layer (ladder, axes, manifesto), R7 ablation rule, bilingual EN/ZH, 7-field submission form with axis declaration + ablation baseline
+- [ ] v0.2 — GitHub-repo-as-backend: submissions become issues/PRs, AI audit runs as GitHub Action (multi-model consensus for gate.3), ledger appends as commit, axis-tagged scoring
+- [ ] v0.3 — multi-model consensus judge formalized, random human audit sampler, post-hoc reproducibility replays
+- [ ] v0.4 — public API spec for submission endpoint requirements (`/benchmark`, `/baseline`, artifact hash conventions)
 
 ---