Skip to content

This repository is an implementation for building a **Large Language Model (LLM)** from scratch using Python. The goal is to figure out the inner workings of LLMs by implementing core components and techniques used in natural language processing (NLP).

License

Notifications You must be signed in to change notification settings

Khey17/LLM-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLM-from-Scratch

This repository is an implementation for building a Large Language Model (LLM) from scratch using Python. The goal is to figure out the inner workings of LLMs by implementing core components and techniques used in natural language processing (NLP).

Understanding Bigram Language Model

A Bigram Language Model is a simple probabilistic model that predicts the next word in a sequence based on the previous word. It assumes that the probability of a word depends only on the word that came immediately before it.

1. What is a Bigram?

A bigram refers to a pair of consecutive words in a sentence or text. For example:

📌 Sentence: "I love machine learning"
📌 Bigrams: ("I", "love"), ("love", "machine"), ("machine", "learning")

A bigram language model estimates P(w₂ | w₁), the probability of a word w₂ given the previous word w₁.


2. How Does It Work?

A bigram model learns the conditional probabilities of word sequences from a corpus:

$P(w_n | w_{n-1}) = \frac{C(w_{n-1}, w_n)}{C(w_{n-1})}$

where:

  • $( C(w_{n-1}, w_n))$ = Count of bigram $(w_{n-1}, w_n)$
  • $( C(w_{n-1}))$ = Count of word $w_{n-1}$ in the corpus

Example: Bigram Probabilities for "hello"

For the word "hello", we compute bigram probabilities by counting occurrences in a corpus:

Bigram Count Total Count of Previous Character Probability
h → e 1 1 (h appears once) 1/1 = 1.000
e → l 1 1 (e appears once) 1/1 = 1.000
l → l 1 2 (l appears twice) 1/2 = 0.500
l → o 1 2 (l appears twice) 1/2 = 0.500

Text Generation Example

Using the learned bigram probabilities, we generate "hello" step by step:

  1. Start with "h"
  2. P(e | h) = 1.000"he"
  3. P(l | e) = 1.000"hel"
  4. P(l | l) = 0.500"hell"
  5. P(o | l) = 0.500"hello"

About

This repository is an implementation for building a **Large Language Model (LLM)** from scratch using Python. The goal is to figure out the inner workings of LLMs by implementing core components and techniques used in natural language processing (NLP).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published