⚙️ Setup

Autocomp Logo

AI-Driven Code Optimizer for Tensor Accelerators

| arXiv | Blog |

Welcome to the code repository of Autocomp. Recent updates:

(1/22/2026) Reorganized repo structure to make it easier to add a new backend.

(1/8/2026) Check out our latest 📝 blog post on optimizing attention on Trainium!

📚 Paper: Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

✏️ Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, and Yakun Sophia Shao (UC Berkeley)

What is Autocomp?

Autocomp is an LLM-driven code optimizer for tensor accelerators. Autocomp is designed to be portable and easy to use across a variety of hardware backends, and has already demonstrated strong performance on an industry accelerator (AWS Trainium), an academic accelerator (Gemmini), NVIDIA GPUs, and even the RISC-V Vector Extension.

How does Autocomp work?

Autocomp decomposes the optimization problem into a beam search, where each iteration is further divided into a planning phase and an implementation phase. Autocomp applies the user's domain knowledge, along with a variety of techniques to successfully explore the search space, in order to iteratively improve the code. For more details, see our paper.

⚙️ Setup

Backend Setup

Autocomp can currently optimize code for the following backends:

Trainium (trn_setup.md)
Gemmini (gemmini_setup.md)
CUDA via KernelBench (kb_setup.md)

Partially supported backends:

RISC-V Vector (RVV) on Canaan Kendryte K230. See k230 branch for code. As the implementation is very hacky, we do not currently recommend using this backend.

For instructions on adding a new backend, see ADDING_A_BACKEND.md.

LLM Setup

Autocomp supports both local and remote endpoint LLM inference. For local inference, we support vLLM's OpenAI-compatible server. For endpoint inference, we support a variety of providers (see below).

Local Inference with vLLM

Install and launch vLLM:

pip install vllm
vllm serve --model Qwen/Qwen3-8B --port 8000 -tp <number of GPUs>

Configure Autocomp: Set models/code_models in search.py:
```
models = ["vllm::Qwen/Qwen3-8B"]
```
Optionally set VLLM_API_BASE if using a different host/port (default: http://localhost:8000/v1).

For more details, see the vLLM documentation.

LLM Endpoint Setup

API keys can be configured via environment variables or in autocomp/common/keys.py. Environment variables take precedence over the keys file. The variable names in keys.py match the corresponding environment variable names.

Supported keys:

Provider	Environment Variable / Key Name	Provider Name in `search.py`
----------	--------------------------------
OpenAI	`OPENAI_API_KEY`	`openai`
Anthropic	`ANTHROPIC_API_KEY`	`anthropic`
Together	`TOGETHER_API_KEY`	`together`
AWS Bedrock	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`	`aws`
Google Cloud	`GOOGLE_CLOUD_LOCATION`, `GOOGLE_CLOUD_PROJECT`	`gcp`

Example autocomp/common/keys.py:

OPENAI_API_KEY = "sk-..."
ANTHROPIC_API_KEY = "sk-ant-..."
TOGETHER_API_KEY = "..."
AWS_ACCESS_KEY_ID = "AKIA..."
AWS_SECRET_ACCESS_KEY = "..."
GOOGLE_CLOUD_LOCATION = "us-central1"
GOOGLE_CLOUD_PROJECT = "my-project"

Keys can be omitted if not needed. On startup, Autocomp logs which keys are available.

Gemini Setup

To use Gemini via Google Cloud, install the Google Cloud CLI as described at https://docs.cloud.google.com/sdk/docs/install-sdk#linux.

Run gcloud auth application-default login to enable the Google Cloud SDK.

AWS Bedrock

Note that we currently only support Anthropic models on AWS Bedrock.

🚀 Usage

autocomp/search/search.py is the entry point for running Autocomp optimization. Various parameters such as backend, models used, beam size, number of plans, number of code implementations, dropout, etc. can be configured here.

Notable parameters:

backend: The hardware backend to use. Currently supported backends are gemmini, trn, and cuda.
models: The list of models to use. Models are specified "<provider>::<model>", for example "openai::gpt-5.2" or "gcp::gemini-3-pro-preview". Currently supported endpoint providers are OpenAI (openai), Google Vertex AI (gcp), Anthropic (anthropic), AWS Bedrock (aws), and Together (together). Use provider vllm for local serving.
code_models: The list of models to use for the implementation phase of prompting, if you would like to use a distinct set of models from planning. Can be set to None to use the same set of models.
simulator: The evaluation method to use.
- For Trainium, trn
- For Gemmini, spike (only optimizes instruction counts, not cycle counts) or firesim
- For CUDA, kernelbench
iterations: The number of iterations to run.
search_strategy: The search strategy to use. Currently only beam is supported.
prob_type: The problem type to use.
- For Trainium, trn-tutorial or trn-advanced.
- For Gemmini, gemm, conv, or admm-multifunction.
- For CUDA, kb-level1, kb-level2, kb-level3, or kb-level4.
prob_id: The problem ID to use.

📁 Repository Structure

autocomp/ - Core Autocomp code.

search/ - Search algorithm (search.py) and optimization infrastructure.
agents/ - LLM agents for planning and code generation. Each backend has its own subdirectory (e.g., gemmini/, trn/, cuda/) with agent code and prompts.
backend/ - Hardware evaluation. Each backend has its own subdirectory (e.g., gemmini/, trn/, kernelbench/) with evaluation code and setup instructions.
common/ - Shared utilities (LLM interface, logging, etc.).

sols/ - Baseline code for benchmarks (organized by problem type).

tests/ - Test cases corresponding to sols/.

examples/ - Example optimization traces from Autocomp.

📜 Citation

@misc{hong2025autocomp,
      title={Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators}, 
      author={Charles Hong and Sahil Bhatia and Alvin Cheung and Yakun Sophia Shao},
      year={2025},
      eprint={2505.18574},
      archivePrefix={arXiv},
      primaryClass={cs.PL},
      url={https://arxiv.org/abs/2505.18574}, 
}

📝 Changelog

(11/18/2025) Added documentation for adding a new backend (ADDING_A_BACKEND.md), added the examples directory for example optimization traces, and published 📝 blog post 4 about how we optimized conv1d on Trainium.

(11/3/2025) Added code/documentation for setting up Trainium backend. Check out 📝 blog post 3 for more details.

(9/22/2025) Added code/documentation for setting up CUDA/KernelBench backend, plus code for RVV optimization. Check out 📝 blog post 2 for more details.

(6/6/2025) Initial code + 📝 blog post 1 release!

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
autocomp		autocomp
examples		examples
img		img
sols		sols
tests		tests
.gitignore		.gitignore
ADDING_A_BACKEND.md		ADDING_A_BACKEND.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Driven Code Optimizer for Tensor Accelerators

What is Autocomp?

How does Autocomp work?

⚙️ Setup

Backend Setup

LLM Setup

Local Inference with vLLM

LLM Endpoint Setup

Gemini Setup

AWS Bedrock

🚀 Usage

📁 Repository Structure

📜 Citation

📝 Changelog

About

Uh oh!

Releases

Packages

Languages

License

ucb-bar/autocomp

Folders and files

Latest commit

History

Repository files navigation

AI-Driven Code Optimizer for Tensor Accelerators

What is Autocomp?

How does Autocomp work?

⚙️ Setup

Backend Setup

LLM Setup

Local Inference with vLLM

LLM Endpoint Setup

Gemini Setup

AWS Bedrock

🚀 Usage

📁 Repository Structure

📜 Citation

📝 Changelog

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages