AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

AccelOpt is a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs.

🚧 This repository is still under construction.

Setup

Trainium

Machines

EC2 Instance: trn1.32xlarge

AMI: Deep Learning AMI Neuron (Ubuntu 22.04)

Install

source /opt/aws_neuronx_venv_pytorch_2_7/bin/activate # Check the PyTorch version of your AMI
pip install logfire
pip install openai-agents
git clone [email protected]:zhang677/AccelOpt.git
cd AccelOpt
python setup.py install

Run

experiments/full_complete_local shows how to run AccelOpt on NKIBench with a local served gpt-oss-120b.

NVIDIA GPU

Machines

EC2 Instance: p5.48xlarge

AMI: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.8 (Ubuntu 24.04)

Install

# Create a PyTorch environment first
pip install logfire
pip install openai-agents
git clone https://github.com/zhang677/flashinfer-bench.git # DPS is not consistent with flashinfer-trace now
cd flashinfer-bench
pip install -v -e .
cd ..
git clone [email protected]:zhang677/AccelOpt.git
cd AccelOpt
python setup.py install

Run

experiments/flb_full_complete_local shows how to run AccelOpt on FlashInfer-Bench with a local served gpt-oss-120b.

NKIBench: Kernel Benchmark for AWS Trainium accelerators

NKIBench is a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt.

Dataset

All the kernels are under /NKIBench. Kernels are grouped by operator name and configuration in a structured storage format as shown in the /NKIBench/summary.json

Profiling

AccelOpt provides a NKIKernel class that is pluggable to any AI optimizers. /tests shows how to use the profiling API for single and a group of NKI kernels.

from accelopt.kernel_wrapper import NKIKernel
nki_kernel = NKIKernel(program_path, base_numpy_path)
result = nki_kernel.profile(save_fields)

NKIBench estimates the best achievable performance offered by the Trainium hardware, which offers additional insights on how effective AccelOpt has been in exploring the entire optimization landscape. The best achievable performance is calculated in experiments/full_complete_local/calculate_percentage_of_peak.py

Result

Per-task kernel improvement achieved using Claude Sonnet 4 and AccelOpt (gpt-oss-120b + Qwen3-Coder-480B) on Trainium 1 of NKIBench.

Triton kernel improvement achieved using AccelOpt (gpt-oss-120b) on H100 SXM5 over the best (2025-12-23) across all 9 categories with Triton baselines of FlashInfer-Bench.

Citation

This work is a follow-up to Adaptive Self-improvement LLM Agentic System for ML Library Development (ICML 2025) [paper] [blog] [code]. If you find this project useful, please cite:

@inproceedings{zhang2025adaptive,
  title={Adaptive Self-improvement LLM Agentic System for ML Library Development},
  author={Zhang, Genghan and Liang, Weixin and Hsu, Olivia and Olukotun, Kunle},
  booktitle={Forty-second International Conference on Machine Learning}
}

@article{zhang2025accelopt,
  title={AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization},
  author={Zhang, Genghan and Zhu, Shaowei and Wei, Anjiang and Song, Zhenyu and Nie, Allen and Jia, Zhen and Vijaykumar, Nandita and Wang, Yida and Olukotun, Kunle},
  journal={arXiv preprint arXiv:2511.15915},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
NKIBench		NKIBench
accelopt		accelopt
experiments		experiments
img		img
prompts		prompts
scripts		scripts
templates		templates
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

Setup

Trainium

Machines

Install

Run

NVIDIA GPU

Machines

Install

Run

NKIBench: Kernel Benchmark for AWS Trainium accelerators

Dataset

Profiling

Result

Citation

About

Uh oh!

Releases

Packages

Languages

License

zhang677/AccelOpt

Folders and files

Latest commit

History

Repository files navigation

AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

Setup

Trainium

Machines

Install

Run

NVIDIA GPU

Machines

Install

Run

NKIBench: Kernel Benchmark for AWS Trainium accelerators

Dataset

Profiling

Result

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages