VNT is a research prototype that explores a Transformer variant with explicit control-inspired components:
- data flow (token states)
- instruction flow (operator routing)
- control flow (dynamic step decisions)
This repository focuses on reproducible experiments, transparent baselines, and fast single-machine iteration.
- Baseline and VNT training pipelines are runnable.
- Stable and experimental VNT implementations are separated.
- Auto-logging for PPL, approximate FLOPs, and average micro-steps is included.
- A from-scratch pretraining scaffold is included (
pretrain.py). - Current VNT experiments do not yet show a consistent Pareto win over the baseline.
train.py: baseline/VNT training (stable/experimentalimpl switch)eval.py: evaluation entrypointcompare_runs.py: stable vs experimental log comparisonpareto_report.py: same-quality lower-compute reportpretrain.py: from-scratch decoder-only pretrainingevaluate_pretrain.py: checkpoint evaluator for pretrainingtokenizer_build.py: local vocab builderpretrain_config.yaml: from-scratch configpretrain_config_round1.yaml: Round-1 ablation (RoPE + QK-Norm)baseline_transformer.py: matched baseline modelvnt_transformer.py: experimental VNT implementationvnt_transformer_stable.py: stable VNT implementationVNT-Architecture-Guide.md: architecture notesVNT-Architecture-Diagram-EN.svg: architecture diagram
By default, configs use local data:
./input.txt
No HuggingFace download is required when data.text_file is set.
pip install -r requirements-vnt.txtBaseline:
python train.py --config config.yaml --model baseline --impl stableVNT stable:
python train.py --config config_vnt_stable.yaml --model vnt --impl stableVNT experimental:
python train.py --config config_vnt_experimental.yaml --model vnt --impl experimental- Build vocab
python tokenizer_build.py --input ./input.txt --out ./output/pretrain_vocab.json --max-vocab 8000 --min-freq 1 --lowercase- Train with CORE early-stop
python pretrain.py --config pretrain_config.yaml --core-threshold 0.256525- Evaluate checkpoint
python evaluate_pretrain.py --ckpt ./output/pretrain_checkpoints/pretrain_core_target.pt --text ./input.txt --seq-len 256 --iters 50 --batch-size 4CORE definition:
CORE = 1 / (1 + val_bpb)- training stops early when
CORE >= threshold.
- Standard sweep:
python context_sweep.py --lengths 128,256,512,1024 --steps 1200
- Stabilized sweep templates:
python context_sweep_stable.py --lengths 128,256,512
microgpt.pyandv2.pyare educational scripts inspired by minimal GPT implementations.- This repo is a prototype and may change rapidly.