MiniGPT-from-Scratch 🚀

title	MiniGPT-from-Scratch
emoji	🚀
colorFrom	indigo
colorTo	purple
sdk	gradio
sdk_version	5.16.2
app_file	app.py
pinned	false

MiniGPT-from-Scratch 🚀

A from-scratch implementation of a modern, decoder-only Transformer language model, built in 30 days. This project is a deep-dive into state-of-the-art LLM engineering, transitioning from classic GPT-2 designs to modern Llama-style architectures.

🌟 Features

Custom BPE Tokenizer: Fully trained on the dataset using Hugging Face tokenizers.
Modern Architecture: Multi-head causal self-attention, SwiGLU MLPs, RMSNorm (Pre-Norm), and Rotary Positional Embeddings (RoPE).
Optimized Training: Supports PyTorch Flash Attention (A100/V100), AMP (Mixed Precision), Cosine Decay with Warmup, and Gradient Accumulation.
Interactive Demo: Built-in Gradio web app for real-time text generation.
Evaluation: Integrated perplexity calculation for model benchmarking.

🛠️ Tech Stack

Language: Python 3.10+
Deep Learning: PyTorch
Tokenization: Hugging Face Tokenizers
Interface: Gradio
Data: FineWeb-Edu (Sample)

📁 Project Structure

MiniGPT/
├── data/               # Raw and processed datasets
├── notebooks/          # Colab/Kaggle training templates
├── src/
│   ├── datasets/       # Data loading and preprocessing logic
│   ├── model/          # Transformer architecture (GPT, Attention, Blocks)
│   ├── tokenizer/      # BPE training and wrapper
│   ├── train/          # Training loop with AMP and validation
│   └── app.py          # Gradio Web Demo
├── checkpoints/        # Saved model weights (.pt)
└── requirements.txt    # Project dependencies

🚀 How to Use

1. Setup

git clone https://github.com/mrshibly/MiniGPT-from-Scratch.git
cd MiniGPT-from-Scratch
pip install -r requirements.txt

2. Preprocess Data

python src/datasets/download_fineweb.py
python src/datasets/clean_text.py
python src/tokenizer/train_tokenizer.py
python src/datasets/prepare_data.py

3. Train

To train locally (CPU/GPU):

python src/train/train.py

Note: For full 50M training, use the Colab Template.

4. Interactive Demo

Once you have a checkpoint in checkpoints/ckpt.pt:

python src/app.py

📊 Model Configurations

Config	Params	Layers	Heads	d_model	Target Device
Tiny	~7M	4	4	256	CPU/Laptop
Standard	~50M	6	8	512	Free GPU
Stretch	~124M	12	12	768	Colab Pro (A100)

📊 Latest Training Benchmarks

Dataset: 10GB FineWeb-Edu (Targeting)
Parameters: 124 Million
Optimization: FlashAttention-2 enabled via Colab Pro.
Status: Scaling up to GPT-2 Small equivalent intelligence.

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cv_points.md		cv_points.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniGPT-from-Scratch 🚀

🌟 Features

🛠️ Tech Stack

📁 Project Structure

🚀 How to Use

1. Setup

2. Preprocess Data

3. Train

4. Interactive Demo

📊 Model Configurations

📊 Latest Training Benchmarks

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MiniGPT-from-Scratch 🚀

🌟 Features

🛠️ Tech Stack

📁 Project Structure

🚀 How to Use

1. Setup

2. Preprocess Data

3. Train

4. Interactive Demo

📊 Model Configurations

📊 Latest Training Benchmarks

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages