GitHub - ahmadsm1/llm-from-scratch: An LLM from scratch with PyTorch

GPT-2 from Scratch

This repo follows Sebastian Raschka's guide on building a large-language model (GPT-2-small) from scratch with PyTorch.

Each of the Jupyter notebooks represents a different stage of building the LLM:

embeddings.ipynb: Create a dataLoader for processing the raw text. Raw text is broken into tokens, tokens are converted into token embeddings. Add them to positional embeddings to get the input embeddings
attention.ipynb: the multi-head attention module that takes previous words of an input into context. Implemented simplified self-attention, self-attention with trainable weights, casual attention and multi-head attention
implementation.ipynb: Implement basic LLM architecture: normalization layer, FeedForward module (GELU), shortcut connections and the transformer block
training.ipynb: measuring text generation loss (back propogation), training the model on a corpus (the_verdict.txt), controling randomness of text generation (temperature scaling and top-k sampling) and loading the pre-trained weights for GPT-2 from OpenAI

utils.py contains code for the various classes used across the different notebooks, since most of them are re-used.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitignore		.gitignore
README.md		README.md
attention.ipynb		attention.ipynb
embeddings.ipynb		embeddings.ipynb
gpt_download.py		gpt_download.py
implementation.ipynb		implementation.ipynb
model.py		model.py
the-verdict.txt		the-verdict.txt
training.ipynb		training.ipynb
utils.py		utils.py