Experiment: Evolving nanochat with LLM Guided Mutations #439

badlogicmanpreet · 2026-01-14T20:08:16Z

badlogicmanpreet
Jan 14, 2026

Hi everyone 👋

I ran evolutionary optimization evolve-nanochat on nanochat’s GPT using LLM guided mutations and real training loss as the fitness signal. Sharing results in case they’re useful.

The evolution used OpenEvolve (opensource AlphaEvolve) with Claude Opus 4, evolving gpt.py over ~60 iterations and evaluating candidates via real training runs (300–1000 steps).

Results

Metric	Evolved	Change
Train Loss	5.361	−3.3%
Throughput	40K tok/s	+61%
Parameters	852K	0%

⚠️ Note: Validation loss did not improve (+0.7%). These changes improve training efficiency, not generalization.

Techniques Explored by the Evolve

During evolution, a wide range of architectural and optimization techniques were explored (not all retained in the final model):

SwiGLU activation (replacing ReLU²)
Parameter-matched SwiGLU (2/3 hidden size)
Interleaved local / global attention (Gemma 2 "SL" pattern)
Embedding scaling (x * √n_embd)
Residual scaling (1 / √n_layer)
Orthogonal initialization
Differential attention (dual Q/K with learnable λ)
Frequency-aware initialization (SVD-based)
Layer-position–aware residual scaling
Learnable temperature per attention head
QK LayerNorm
Residual gradient clipping
Layer-dependent λ (lambda)
Proper weight decay (exclude biases / norms / embeddings)
Additional attention norm before projection

Most of these provided marginal or unstable gains at nanochat scale.
parameter-matched SwiGLU and interleaved attention consistently delivered the strongest signal.

Takeaway

Architectural tweaks reliably improved train loss and speed, but not validation loss. Improving generalization likely needs data or training level changes rather than architecture alone.

Reproduction

Code and evolution setup:
👉 https://github.com/badlogicmanpreet/evolve-nanochat

Huge thanks to @karpathy for nanochat, the clean, minimal design makes exploration and learning a joy. 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Evolving nanochat with LLM Guided Mutations #439

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Experiment: Evolving nanochat with LLM Guided Mutations #439

Uh oh!

badlogicmanpreet Jan 14, 2026

Results

Techniques Explored by the Evolve

Takeaway

Reproduction

Replies: 0 comments

badlogicmanpreet
Jan 14, 2026