Skip to content

builtbyashwin/TinyGPT-30k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TinyGPT-30K

A small Transformer-based language model built from scratch in PyTorch and trained on CPU-only hardware.

This project was created to understand how modern language models work at a fundamental level. Instead of using pre-trained models or external APIs, every major component was implemented manually, including tokenization, self-attention, Transformer blocks, training, and text generation.

The model was trained on the Tiny Shakespeare dataset and generates Shakespeare-like text at the character level.

Why I Built This

I wanted to understand how language models work internally rather than relying on pre-trained models or APIs. My goal was to build a complete Transformer from scratch that could be trained on modest hardware and still demonstrate the core principles used in modern language models.

Features

  • Character-level language modeling
  • Transformer architecture with causal self-attention
  • Multi-head attention
  • Positional embeddings
  • Autoregressive text generation
  • CPU-only training and inference
  • Implemented entirely in PyTorch

Model Architecture

Configuration

Parameter Value
Embedding Size 32
Attention Heads 2
Transformer Layers 2
Context Length 64
Vocabulary Type Character-level
Parameters 31,553

Architecture Flow

Input Characters
        ↓
Token Embeddings
        +
Positional Embeddings
        ↓
2 Transformer Blocks
    ├─ Multi-Head Self-Attention
    ├─ Feed Forward Network
    ├─ Residual Connections
    └─ Layer Normalization
        ↓
Language Modeling Head
        ↓
Predicted Next Character

Training

Hardware

  • CPU: Intel Core i5-3320M
  • Cores / Threads: 2 Cores, 4 Threads
  • RAM: 8 GB
  • Operating System: Debian 13 (Trixie)

Training Configuration

Parameter Value
Optimizer AdamW
Learning Rate 1e-3
Batch Size 16
Training Steps 3000
Device CPU

Results

Metric Value
Initial Loss 4.41
Final Loss 2.11
Parameters 31,553

The loss decreased steadily during training, indicating that the model successfully learned character patterns, word boundaries, punctuation usage, and text structure from the training corpus.

Example Output

Walive neall tas o how no ger shat,
Yors lag ce sveflid nasthbl prou, at
am knoffet toue as arve tre, hrom Anjandke ass
Whainener can ate allles ther ireg; ofe is,
That iver, mel: pre thabl boourds:

Although the generated text is not fully coherent, it demonstrates that the model learned aspects of English spelling, formatting, capitalization, and Shakespeare-like structure.

Project Structure

tinygpt/
│
├── data/
│   └── corpus.txt
│
├── dataset.py
├── model.py
├── train.py
├── generate.py
│
├── checkpoints/
│
└── README.md

What I Learned

Through this project, I gained practical experience with:

  • Neural network fundamentals
  • Transformer architecture
  • Self-attention mechanisms
  • Tokenization and vocabulary construction
  • Language model training
  • Loss functions and optimization
  • Text generation
  • PyTorch development
  • Running machine learning workloads on limited hardware

Future Improvements

Potential improvements include:

  • Weight tying
  • Dropout regularization
  • Checkpoint resume support
  • Better datasets
  • Subword tokenization
  • Larger model sizes
  • Validation loss tracking
  • Training visualizations

Conclusion

This project demonstrates that even a small Transformer with only 31,553 parameters can learn meaningful patterns from text. Building and training the model from scratch provided a practical understanding of the core ideas behind modern language models while remaining lightweight enough to run on older consumer hardware.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages