Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 3 additions & 9 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
.DS_Store
node_modules/
.vscode/
.idea/
*.log
.env
.env.local
dist/
build/
.vercel/
.gitignore.local
/tmp/
62 changes: 49 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Token-Ignition

> The task is the interview. Pass it, and you join the research.
> Model horizons double every four months. The scaffolds around them do not. That gap is the work.

Token-Ignition is a **selection gate for AI-native researchers**.

We don't hire on résumés, pitches, or intro calls. We invert the interview: you define a task, you build a system, and if that system can evolve itself and clear an AI-audited gate, you're invited into the research group.
We don't hire on résumés, pitches, or intro calls. We invert the interview: you define a long-horizon task, you build a scaffold that evolves itself, and if that scaffold clears an AI-audited gate, you're invited into the research group.

Tokens are how we make that possible — not why we do it.

Expand All @@ -15,42 +15,78 @@ Tokens are how we make that possible — not why we do it.
This repository hosts the **v0.1 protocol test** of Token-Ignition:

- static frontend (`index.html` + `assets/`) — live at the Token-Ignition submission site
- meta-rules, protocol spec, and submission schema (see `/spec`)
- meta-rules, protocol spec, and submission schema (in-page + this README)
- public ledger of submission hashes — AI audit results appended by bot (to be wired)

No backend is wired yet. Submissions in v0.1 are captured client-side and hashed into a local ledger for visual/UX testing; the real submission pipeline (GitHub issue / serverless endpoint → AI audit workflow → ledger append) will land in v0.2.

---

## The thesis

Model capability doubles roughly every four months — METR's measured cadence has only accelerated since their original paper. The engineering around it — memory, tools, planning loops, self-correction, long-horizon coherence — moves at a human pace, and caps out several orders of magnitude earlier than the models themselves could support.

This lab exists to close that gap. We are climbing a ladder of **token horizons per single coherent task**, from 10⁶ to 10¹². Each rung unlocked becomes a product.

```
10¹² ──────────── ? the research frontier
10¹¹ ─────────── autonomous long-horizon research
10¹⁰ ────────── multi-month project execution
10⁹ ───────── self-evolving codebases
10⁸ ──────── gate.3 — ignition
10⁷ ─────── gate.2 — verified
10⁶ ────── gate.1 — admission
```

The first three rungs are the admission ramp. What happens above them is the lab.

---

## Three axes of self-evolution

A self-evolving scaffold must move along at least one of these axes between runs — without a human editing prompts, weights, or code in between. The third is rare. We weight it highest.

| axis | what it means |
|------|---------------|
| **01 // behavior** | The scaffold changes *how* it acts across runs — its policies, decision rules, strategy. |
| **02 // knowledge** | The scaffold accumulates, distills, or restructures *what it knows* across runs. |
| **03 // scaffold** | The scaffold modifies *itself* across runs — its tools, control loop, evaluation criteria, its successor. |

A system that only rewrites its prompt is not, by itself, a self-evolving scaffold.

---

## The three gates

| gate | budget | unlock condition |
| --- | --- | --- |
| `gate.1` — admission | 1M tokens | any well-formed submission is admitted |
| `gate.2` — verified | 10M tokens | AI auditor confirms reproducible self-evolution on gate.1 artifact |
| `gate.3` — research | 100M tokens | emergent behavior verified by consensus of ≥3 independent models; standing invitation to join the research group |
| `gate.2` — verified | 10M tokens | AI auditor confirms reproducible self-evolution on the gate.1 artifact, with a real delta against ablation |
| `gate.3` — research | 100M tokens | scaffold-level evolution — not merely behavioral drift — verified by consensus of ≥3 independent models; standing invitation to join the research group |

Clearing `gate.3` is how you get in. Tokens are the side-effect that lets you keep going.

---

## Meta-rules

- **R1** — you define the task; it must require a system that evolves itself.
- **R1** — you define the task; it must be long-horizon, and must require a scaffold that evolves itself to push the achievable horizon further.
- **R2** — you define the evaluation criterion; it must be reproducible and machine-verifiable.
- **R3** — you build the system; the system, not you, produces the final output.
- **R3** — you build the scaffold; the scaffold, not you, produces the final output.
- **R4** — all submissions are AI-judged; human audit is random and post-hoc.
- **R5** — identity is irrelevant; submissions are accepted under pseudonym.
- **R6** — selection is gated; pass a gate, unlock more resources; pass the final gate, join the research.
- **R7** — ablation is required; you must submit a baseline run on the same model with the minimal scaffold. The delta is the evidence. A scaffold that cannot beat its own ablation is not evolving — it is merely present.

---

## What counts as a valid submission

1. **Self-evolution, not prompt-engineering.** The system must modify its own behavior across iterations without human edits to prompts, weights, or code between runs.
1. **Self-evolution, not prompt-engineering.** The scaffold must modify its own behavior, knowledge, or structure across iterations without human edits between runs.
2. **Machine-verifiable output.** The evaluation criterion must be checkable by an AI auditor with no proprietary access — public endpoint, public artifact.
3. **Live, AI-readable endpoint.** You must provide a URL an AI can crawl. HTML is fine; JSON / OpenAPI / plain text are better. Logins, captchas, GUIs are not accepted.
4. **Reproducibility micro-run.** Attach at least one log of a full run: inputs, intermediate state, final artifact hash. Our auditor re-runs a randomly sampled slice.
4. **Ablation baseline.** A second endpoint or log: same model, minimal scaffold, same task. Your scaffold's contribution is the delta between this and your main endpoint.
5. **Reproducibility micro-run.** Attach at least one log of a full run: inputs, intermediate state, final artifact hash. Our auditor re-runs a randomly sampled slice.

If the AI auditor cannot independently verify your artifact, the submission is rejected. We do not email you for clarifications. **The endpoint is the application.**

Expand All @@ -70,10 +106,10 @@ The frontend is vanilla HTML/CSS/JS — no build step. Copy / translate strings

## Roadmap

- [x] v0.1 — static protocol test site, hybrid terminal + form UX, bilingual EN/ZH
- [ ] v0.2 — GitHub-repo-as-backend: submissions become issues/PRs, AI audit runs as GitHub Action, ledger appends as commit
- [ ] v0.3 — multi-model consensus judge for `gate.3` (3+ independent models), random human audit sampler
- [ ] v0.4 — public API spec for submission endpoint requirements (`/benchmark`, artifact hash conventions)
- [x] v0.1 — static protocol test site: thesis-layer (ladder, axes, manifesto), R7 ablation rule, bilingual EN/ZH, 7-field submission form with axis declaration + ablation baseline
- [ ] v0.2 — GitHub-repo-as-backend: submissions become issues/PRs, AI audit runs as GitHub Action (multi-model consensus for gate.3), ledger appends as commit, axis-tagged scoring
- [ ] v0.3 — multi-model consensus judge formalized, random human audit sampler, post-hoc reproducibility replays
- [ ] v0.4 — public API spec for submission endpoint requirements (`/benchmark`, `/baseline`, artifact hash conventions)

---

Expand Down
Loading