This repo follows Sebastian Raschka's guide on building a large-language model (GPT-2-small) from scratch with PyTorch.
Each of the Jupyter notebooks represents a different stage of building the LLM:
embeddings.ipynb
: Create a dataLoader for processing the raw text. Raw text is broken into tokens, tokens are converted into token embeddings. Add them to positional embeddings to get the input embeddingsattention.ipynb
: the multi-head attention module that takes previous words of an input into context. Implemented simplified self-attention, self-attention with trainable weights, casual attention and multi-head attentionimplementation.ipynb
: Implement basic LLM architecture: normalization layer, FeedForward module (GELU), shortcut connections and the transformer blocktraining.ipynb
: measuring text generation loss (back propogation), training the model on a corpus (the_verdict.txt
), controling randomness of text generation (temperature scaling and top-k sampling) and loading the pre-trained weights for GPT-2 from OpenAI
utils.py
contains code for the various classes used across the different notebooks, since most of them are re-used.