This repository contains a modular implementation of a Transformer model built entirely from scratch using NumPy leveraging OOPs concepts, without using PyTorch or TensorFlow.
It also includes training notebooks, Word2Vec-based embeddings, and utilities for low-level neuron analysis and debugging.
Implements the full Transformer architecture based on the Attention Is All You Need paper.
- Self Attention
- Scaled Dot-Product Attention
- Feed-Forward Networks (FFN)
- Residual Connections + LayerNorm
- Positional Encoding
- Encoder Layer
- Decoder Layer
- Masked (causal) attention for decoding
- Cross-attention between encoder → decoder
Supports:
- Batching
- Sequence-level attention
- Word2Vec embeddings as token vectors
Contains fundamental neural components implemented from scratch:
Layer_DenseActivation_ReLUActivation_Softmax
Loss_CrossCategoricalEntropy
OptimizerAdam(with momentum, RMS, and bias correction)
These mimic deep learning library internals but are written manually for transparency.
The project uses:
nltk.word_tokenizefor tokenizationgensim.Word2Vecfor dense vector embeddings
Workflow:
- Tokenize English/Spanish sentences
- Convert tokens → vectors via Word2Vec
- Pass sequence embeddings → Transformer
Shows complete flow:
- Tokenization
- Vocabulary mapping
- Embedding lookup
- Padding & batching
- Forward pass
- Loss computation
- Backpropagation
- Parameter updates (Adam)
- Logging loss curves
- Start with
<SOS>token - Autoregressive decoding
- Add positional encodings each step
- Use encoder output for all decoding steps
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt