Skip to content

ahmadsm1/llm-from-scratch

Repository files navigation

GPT-2 from Scratch

This repo follows Sebastian Raschka's guide on building a large-language model (GPT-2-small) from scratch with PyTorch.

Each of the Jupyter notebooks represents a different stage of building the LLM:

  • embeddings.ipynb: Create a dataLoader for processing the raw text. Raw text is broken into tokens, tokens are converted into token embeddings. Add them to positional embeddings to get the input embeddings
  • attention.ipynb: the multi-head attention module that takes previous words of an input into context. Implemented simplified self-attention, self-attention with trainable weights, casual attention and multi-head attention
  • implementation.ipynb: Implement basic LLM architecture: normalization layer, FeedForward module (GELU), shortcut connections and the transformer block
  • training.ipynb: measuring text generation loss (back propogation), training the model on a corpus (the_verdict.txt), controling randomness of text generation (temperature scaling and top-k sampling) and loading the pre-trained weights for GPT-2 from OpenAI

utils.py contains code for the various classes used across the different notebooks, since most of them are re-used.

About

An LLM from scratch with PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published