LLM-from-Scratch

This repository is an implementation for building a Large Language Model (LLM) from scratch using Python. The goal is to figure out the inner workings of LLMs by implementing core components and techniques used in natural language processing (NLP).

Understanding Bigram Language Model

A Bigram Language Model is a simple probabilistic model that predicts the next word in a sequence based on the previous word. It assumes that the probability of a word depends only on the word that came immediately before it.

1. What is a Bigram?

A bigram refers to a pair of consecutive words in a sentence or text. For example:

📌 Sentence: "I love machine learning"
📌 Bigrams: ("I", "love"), ("love", "machine"), ("machine", "learning")

A bigram language model estimates P(w₂ | w₁), the probability of a word w₂ given the previous word w₁.

2. How Does It Work?

A bigram model learns the conditional probabilities of word sequences from a corpus:

$P(w_n | w_{n-1}) = \frac{C(w_{n-1}, w_n)}{C(w_{n-1})}$

where:

$( C(w_{n-1}, w_n))$ = Count of bigram $(w_{n-1}, w_n)$
$( C(w_{n-1}))$ = Count of word $w_{n-1}$ in the corpus

Example: Bigram Probabilities for "hello"

For the word "hello", we compute bigram probabilities by counting occurrences in a corpus:

Bigram	Count	Total Count of Previous Character	Probability
h → e	1	1 (h appears once)	1/1 = 1.000
e → l	1	1 (e appears once)	1/1 = 1.000
l → l	1	2 (l appears twice)	1/2 = 0.500
l → o	1	2 (l appears twice)	1/2 = 0.500

Text Generation Example

Using the learned bigram probabilities, we generate "hello" step by step:

Start with "h"
P(e | h) = 1.000 → "he"
P(l | e) = 1.000 → "hel"
P(l | l) = 0.500 → "hell"
P(o | l) = 0.500 → "hello"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
bigram.ipynb		bigram.ipynb
wizard_of_oz.txt		wizard_of_oz.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-from-Scratch

Understanding Bigram Language Model

1. What is a Bigram?

2. How Does It Work?

$P(w_n | w_{n-1}) = \frac{C(w_{n-1}, w_n)}{C(w_{n-1})}$

Example: Bigram Probabilities for "hello"

Text Generation Example

About

Releases

Packages

Languages

License

Khey17/LLM-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

LLM-from-Scratch

Understanding Bigram Language Model

1. What is a Bigram?

2. How Does It Work?

$P(w_n | w_{n-1}) = \frac{C(w_{n-1}, w_n)}{C(w_{n-1})}$

Example: Bigram Probabilities for "hello"

Text Generation Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages