LLama-from-scratch

An LLM from Scratch in Pure C++/CUDA

Note: This project is currently a work in progress.

Welcome to the LLama-from-scratch project! Our goal is to build a large language model (LLM) entirely from scratch using C++ and CUDA, leveraging the power of parallel computing for efficient training and inference.

Project Overview

This project aims to implement a full-fledged LLM by following these key steps:

Tensor Operations
CUDA Parallelization
Backpropagation for Tensor Class
Enhanced Parallelization
SentencePiece Tokenizer
Implementing Embeddings
Feed-Forward Networks (FFNs)
Flash Attention Mechanism
Rope Scaling and Other Peripheral Functions
Building Encoders
Integration and Cohesion
Training and Inference
Instruction Fine-Tuning

1. Tensor Operations

Objective: Develop a robust Tensor class to handle multidimensional arrays and basic tensor operations such as addition, subtraction, and multiplication.
Implementation:
- Define tensor data structures and initialize tensors with various data types.
- Implement tensor operations with type safety and memory management.

2. CUDA Parallelization

Objective: Leverage CUDA to parallelize tensor operations for performance improvements.
Implementation:
- Identify computationally intensive operations within the Tensor class.
- Offload these operations to the GPU using CUDA kernels.

3. Backpropagation for Tensor Class

Objective: Implement backpropagation to support training of neural networks.
Implementation:
- Extend the Tensor class to store gradients and support gradient computation.
- Implement backward operations for each tensor operation.

5. SentencePiece Tokenizer

Objective: Implement the SentencePiece tokenizer for efficient text processing.
Implementation:
- Integrate the SentencePiece library to tokenize and detokenize input text.
- Ensure compatibility with the Tensor class for processing tokenized data.

6. Implementing Embeddings

Objective: Develop embedding layers to convert tokens into dense vectors.
Implementation:
- Implement word, positional, and segment embeddings.
- Optimize embedding lookup operations using CUDA.

7. Feed-Forward Networks (FFNs)

Objective: Build FFNs as core components of the neural network.
Implementation:
- Develop fully connected layers with activation functions.
- Optimize forward and backward passes using parallelization.

8. Flash Attention Mechanism

Objective: Implement an efficient attention mechanism using Flash Attention.
Implementation:
- Design attention layers with scaled dot-product attention.
- Optimize memory access patterns and computation using CUDA.

9. Rope Scaling and Other Peripheral Functions

Objective: Implement additional features and scaling techniques for model robustness.
Implementation:
- Incorporate rotary position encodings (RoPE) for better sequence modeling.
- Develop auxiliary functions and utilities to support training and inference.

11. Integration and Cohesion

Objective: Integrate all components to form a cohesive LLM framework.
Implementation:
- Ensure seamless data flow between components.
- Validate the integrated model through rigorous testing.

12. Training and Inference

Objective: Train the LLM and perform efficient inference.
Implementation:
- Develop training loops with backpropagation and optimization algorithms.
- Implement inference mechanisms for real-time text generation.

13. Instruction Fine-Tuning

Objective: Fine-tune the trained LLM for specific instructions and tasks.
Implementation:
- Use supervised fine-tuning techniques with task-specific datasets.
- Optimize the model for low-latency inference and high accuracy.

Contributions

Contributions are welcome, as I prolly don't even know what I am doing lmao. Even this readme was generated by chatgpt to give the readers a crude idea of what I am trying to accomplish. So, if you've got anything, just open a PR and I'll most prolly merge it.

License

This project is licensed under the MIT License.

Name	Name	Last commit message	Last commit date
Latest commit Codedestructor56 Few mods Sep 1, 2024 97a6a1e · Sep 1, 2024 History 49 Commits
data	data	Modifications in the dataloader	Aug 15, 2024
external	external	Adding glove	Aug 17, 2024
include	include	Few mods	Sep 1, 2024
src	src	Rectifications in RMSnorm	Aug 28, 2024
tests	tests	Cleaning up RMSnorm	Aug 30, 2024
.gitignore	.gitignore	Adding a python interface for the tensor module	Aug 18, 2024
.gitmodules	.gitmodules	Adding glove	Aug 17, 2024
CMakeLists.txt	CMakeLists.txt	Adding a python interface for the tensor module	Aug 18, 2024
LICENSE	LICENSE	Create LICENSE	Aug 8, 2024
README.md	README.md	README updated	Aug 5, 2024
bpe_tokenizer	bpe_tokenizer	Initial commit	Aug 3, 2024
bpe_tokenizer.model	bpe_tokenizer.model	Initial commit	Aug 3, 2024
bpe_tokenizer.vocab	bpe_tokenizer.vocab	Initial commit	Aug 3, 2024
bpe_trainer.cpp	bpe_trainer.cpp	Initial commit	Aug 3, 2024
data_downloader.py	data_downloader.py	Initial commit	Aug 3, 2024
libkernels.a	libkernels.a	Added Cuda support for basic tensor ops	Aug 10, 2024
llamascratch	llamascratch	Cleaning up RMSnorm	Aug 30, 2024
pybind_module.cpython-312-x86_64-linux-gnu.so	pybind_module.cpython-312-x86_64-linux-gnu.so	Adding a python interface for the tensor module	Aug 18, 2024
tensor_module.so	tensor_module.so	Cleaning up RMSnorm	Aug 30, 2024
test_entry_point	test_entry_point	Cleaning up RMSnorm	Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLama-from-scratch

An LLM from Scratch in Pure C++/CUDA

Project Overview

1. Tensor Operations

2. CUDA Parallelization

3. Backpropagation for Tensor Class

5. SentencePiece Tokenizer

6. Implementing Embeddings

7. Feed-Forward Networks (FFNs)

8. Flash Attention Mechanism

9. Rope Scaling and Other Peripheral Functions

11. Integration and Cohesion

12. Training and Inference

13. Instruction Fine-Tuning

Contributions

License

About

Releases

Packages

Languages

License

Codedestructor56/LLama-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLama-from-scratch

An LLM from Scratch in Pure C++/CUDA

Project Overview

1. Tensor Operations

2. CUDA Parallelization

3. Backpropagation for Tensor Class

5. SentencePiece Tokenizer

6. Implementing Embeddings

7. Feed-Forward Networks (FFNs)

8. Flash Attention Mechanism

9. Rope Scaling and Other Peripheral Functions

11. Integration and Cohesion

12. Training and Inference

13. Instruction Fine-Tuning

Contributions

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages