Experiment: Evolving nanochat with LLM Guided Mutations #439
badlogicmanpreet
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone 👋
I ran evolutionary optimization evolve-nanochat on nanochat’s GPT using LLM guided mutations and real training loss as the fitness signal. Sharing results in case they’re useful.
The evolution used OpenEvolve (opensource AlphaEvolve) with Claude Opus 4, evolving gpt.py over ~60 iterations and evaluating candidates via real training runs (300–1000 steps).
Results
Techniques Explored by the Evolve
During evolution, a wide range of architectural and optimization techniques were explored (not all retained in the final model):
"SL"pattern)x * √n_embd)1 / √n_layer)Most of these provided marginal or unstable gains at nanochat scale.
parameter-matched SwiGLU and interleaved attention consistently delivered the strongest signal.
Takeaway
Architectural tweaks reliably improved train loss and speed, but not validation loss. Improving generalization likely needs data or training level changes rather than architecture alone.
Reproduction
Code and evolution setup:
👉 https://github.com/badlogicmanpreet/evolve-nanochat
Huge thanks to @karpathy for nanochat, the clean, minimal design makes exploration and learning a joy. 👍
Beta Was this translation helpful? Give feedback.
All reactions