Skip to content

Commit 425aaa1

Browse files
committed
Updated proposal pdf and blog image
1 parent acac7f9 commit 425aaa1

File tree

6 files changed

+7
-3
lines changed

6 files changed

+7
-3
lines changed

.github/actions/spelling/allow/names.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Svrin
7272
Tadel
7373
Taras
7474
Thessaloniki
75+
Timmaraju
7576
Universitat
7677
Unveristy
7778
Uppili
@@ -192,6 +193,7 @@ tapaswenipathak
192193
tfransham
193194
thakkar
194195
tharun
196+
timmaraju
195197
tlattner
196198
vaibhav
197199
vassil

.github/actions/spelling/allow/terms.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ CMSSW
55
Cppyy
66
Debian
77
GPGPU
8+
GPT
89
GSo
910
GSoC
1011
HSF
@@ -28,6 +29,7 @@ cytokine
2829
cytokines
2930
gitlab
3031
gsoc
32+
llm
3133
linkedin
3234
microenvironments
3335
pythonized

_data/contributors.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@
322322
- title: "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation"
323323
status: Ongoing
324324
description: |
325-
Training Large Language Models (LLMs) is computationally expensive, often bottlenecked by the performance limitations of Python-based frameworks. This project addresses this challenge by enhancing LLM training efficiency within a C++ environment through the integration of Clad, a Clang/LLVM compiler plugin for automatic differentiation (AD). We will deveop a custom C++ tensor library specifically designed for optimal interaction with Clad. The core objective is to replace traditional runtime or manual gradient computations with Clad's efficient compile-time differentiation for key LLM operations within a GPT-2 training pipeline. This involves investigating effective strategies to bridge Clad's static analysis with dynamic neural network computations, benchmarking the resulting performance gains in speed and memory usage against a non-Clad baseline, and leveraging OpenMP for further parallelization.
325+
Training Large Language Models is computationally expensive, often limited by the performance limitations of Python-based frameworks. This project addresses this challenge by enhancing LLM training efficiency within a C++ environment through the integration of Clad, a Clang/LLVM compiler plugin for automatic differentiation (AD). We will develop a custom C++ tensor library specifically designed for optimal interaction with Clad. The core objective is to replace traditional runtime or manual gradient computations with Clad's efficient compile-time differentiation for key LLM operations within a GPT-2 training pipeline. This involves investigating effective strategies to bridge Clad's static analysis with dynamic neural network computations, benchmarking the resulting performance gains in speed and memory usage against a non-Clad baseline, and leveraging OpenMP for further parallelization.
326326
proposal: /assets/docs/Rohan_Timmaraju_Proposal_2025.pdf
327327
mentors: Vassil Vassilev, David Lange, Jonas Rembser, Christina Koutsou
328328

_posts/2025-05-21-enhancing-llm-training.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ excerpt: "This GSoC project leverages Clad to optimize LLM training in C++, aimi
55
sitemap: true
66
author: Rohan Timmaraju
77
permalink: blogs/gsoc25_rohan_introduction_blog/
8-
banner_image: /images/blog/gsoc-banner.png
8+
banner_image: /images/blog/LLM_project_banner.jpg
99
date: 2025-05-21
1010
tags: gsoc c++ clang clad llm
1111
---
@@ -29,7 +29,7 @@ This project proposes to tackle this challenge by integrating Clad, an Automatic
2929
To facilitate this integration, I am developing a custom C++ tensor library to be used in neural network training. Inspired by the powerful approaches of libraries such as [llm.c](https://github.com/karpathy/llm.c) and [pytorch](https://docs.pytorch.org/cppdocs/), this library is being designed from the ground up with Clad compatibility in mind. The core idea is to replace manual or internally managed gradient computations with Clad's reverse-mode AD (as in `clad::gradient`) for key LLM operations like matrix multiplications, activation functions, normalization layers, and the final loss function.
3030

3131
### Implementation Plan
32-
1. **Foundation & Baseline:** We'll start by implementing a complete GPT-2 training loop in C++ *without* Clad. This will serve as our performance baseline. GPT-2 is chosen here as a relatively simple open-source LLM architecture capable of being trained on local devices. This could be extended to other architectures like Llama or Mistral.
32+
1. **Foundation & Baseline:** The implementation will start by implementing a complete GPT-2 training loop in C++ *without* Clad. This will serve as our performance baseline. GPT-2 is chosen here as a relatively simple open-source LLM architecture capable of being trained on local devices. This could be extended to other architectures like Llama or Mistral.
3333
2. **Core Clad Integration Strategy:** We will investigate and evaluate different strategies for applying Clad to tensor network gradient calculations, potentially also identifying potential areas where Clad itself could be enhanced for deep learning workloads.
3434
3. **Expanding Integration:** Once a promising strategy is identified and validated on simpler operations, we'll systematically integrate Clad into more complex components of the GPT-2 architecture.
3535
4. **Benchmarking & Optimization:** Benchmarking against our baseline will be crucial to quantify the performance gains (speed, memory). We'll also use profiling tools to identify bottlenecks and optimize the tensor library with Clad. OpenMP may be employed for parallelization to further boost performance.
-648 KB
Binary file not shown.

images/blog/LLM_project_banner.jpg

354 KB
Loading

0 commit comments

Comments
 (0)