Skip to content

This repository contains my implementation of a Large Language Model built from scratch using GPT-2 architecture and weights

Notifications You must be signed in to change notification settings

Karanveer266/Building-LLM-from-Scratch

Repository files navigation

Build a Large Language Model From Scratch Overview :

This repository contains a custom implementation of a Large Language Model (LLM) based on the GPT-2 architecture. The project demonstrates the process of building a transformer-based model from scratch, loading pre-trained weights, and generating text using causal language modeling techniques. It is heavily inspired by Sebastian Raschka's book "Build a Large Language Model (From Scratch)" and Vizuara's YouTube playlist, which provide comprehensive guidance on understanding and implementing LLMs.

Key Features Full implementation of GPT-2 architecture.

Pre-trained weight loading for text generation tasks.

Customizable sampling parameters for coherent text generation.

Modular design for easy experimentation and extension.

Resources This project draws inspiration from the following resources:

Book: Build a Large Language Model (From Scratch) by Sebastian Raschka

The book provides step-by-step guidance on creating LLMs, including coding attention mechanisms, pretraining, fine-tuning, and instruction optimization.

Learn more about the book here.

YouTube Playlist: Vizuara

A detailed video series explaining LLM concepts and implementation strategies.

Check out their playlist here -> https://www.youtube.com/playlist?list=PLPTV0NXA_ZSgsLAr8YCgCwhPIJNNtexWu

Acknowledgments This project is inspired by the following:

Sebastian Raschka's book, which offers an in-depth exploration of LLM development.

Vizuara's YouTube Playlist, which provides practical insights into building LLMs step-by-step.

Special thanks to these resources for making complex concepts accessible to learners and developers!

Feel free to explore, contribute, or use this repository as a foundation for your own projects!

About

This repository contains my implementation of a Large Language Model built from scratch using GPT-2 architecture and weights

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published