TinyGPT-30K

A small Transformer-based language model built from scratch in PyTorch and trained on CPU-only hardware.

This project was created to understand how modern language models work at a fundamental level. Instead of using pre-trained models or external APIs, every major component was implemented manually, including tokenization, self-attention, Transformer blocks, training, and text generation.

The model was trained on the Tiny Shakespeare dataset and generates Shakespeare-like text at the character level.

Why I Built This

I wanted to understand how language models work internally rather than relying on pre-trained models or APIs. My goal was to build a complete Transformer from scratch that could be trained on modest hardware and still demonstrate the core principles used in modern language models.

Features

Character-level language modeling
Transformer architecture with causal self-attention
Multi-head attention
Positional embeddings
Autoregressive text generation
CPU-only training and inference
Implemented entirely in PyTorch

Model Architecture

Configuration

Parameter	Value
Embedding Size	32
Attention Heads	2
Transformer Layers	2
Context Length	64
Vocabulary Type	Character-level
Parameters	31,553

Architecture Flow

Input Characters
        ↓
Token Embeddings
        +
Positional Embeddings
        ↓
2 Transformer Blocks
    ├─ Multi-Head Self-Attention
    ├─ Feed Forward Network
    ├─ Residual Connections
    └─ Layer Normalization
        ↓
Language Modeling Head
        ↓
Predicted Next Character

Training

Hardware

CPU: Intel Core i5-3320M
Cores / Threads: 2 Cores, 4 Threads
RAM: 8 GB
Operating System: Debian 13 (Trixie)

Training Configuration

Parameter	Value
Optimizer	AdamW
Learning Rate	1e-3
Batch Size	16
Training Steps	3000
Device	CPU

Results

Metric	Value
Initial Loss	4.41
Final Loss	2.11
Parameters	31,553

The loss decreased steadily during training, indicating that the model successfully learned character patterns, word boundaries, punctuation usage, and text structure from the training corpus.

Example Output

Walive neall tas o how no ger shat,
Yors lag ce sveflid nasthbl prou, at
am knoffet toue as arve tre, hrom Anjandke ass
Whainener can ate allles ther ireg; ofe is,
That iver, mel: pre thabl boourds:

Although the generated text is not fully coherent, it demonstrates that the model learned aspects of English spelling, formatting, capitalization, and Shakespeare-like structure.

Project Structure

tinygpt/
│
├── data/
│   └── corpus.txt
│
├── dataset.py
├── model.py
├── train.py
├── generate.py
│
├── checkpoints/
│
└── README.md

What I Learned

Through this project, I gained practical experience with:

Neural network fundamentals
Transformer architecture
Self-attention mechanisms
Tokenization and vocabulary construction
Language model training
Loss functions and optimization
Text generation
PyTorch development
Running machine learning workloads on limited hardware

Future Improvements

Potential improvements include:

Weight tying
Dropout regularization
Checkpoint resume support
Better datasets
Subword tokenization
Larger model sizes
Validation loss tracking
Training visualizations

Conclusion

This project demonstrates that even a small Transformer with only 31,553 parameters can learn meaningful patterns from text. Building and training the model from scratch provided a practical understanding of the core ideas behind modern language models while remaining lightweight enough to run on older consumer hardware.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyGPT-30K

Why I Built This

Features

Model Architecture

Configuration

Architecture Flow

Training

Hardware

Training Configuration

Results

Example Output

Project Structure

What I Learned

Future Improvements

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
generate.py		generate.py
model.py		model.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

TinyGPT-30K

Why I Built This

Features

Model Architecture

Configuration

Architecture Flow

Training

Hardware

Training Configuration

Results

Example Output

Project Structure

What I Learned

Future Improvements

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages