SpInfer Artifact for EuroSys'25.

1. Clone this project.

git clone https://github.com/HPMLL/SpInfer_EuroSys25.git
cd SpInfer_EuroSys25
git submodule update --init --recursive
source Init_SpInfer.sh
cd $SpInfer_HOME/third_party/FasterTransformer && git apply ../ft_spinfer.patch
cd $SpInfer_HOME/third_party/sputnik && git apply ../sputnik.patch

Requirements:

Ubuntu 16.04+

gcc >= 7.3

cmake >= 3.30.3

CUDA >= 12.2 and nvcc >= 12.0

NVIDIA GPU with sm >= 80 (i.e., Ampere-A6000 and Ada -RTX4090).

2. Environment Setup. (Install via Conda)

2.1 Install conda on system Toturial.
2.2 Create a conda environment:

cd $SpInfer_HOME
conda env create -f spinfer.yml
conda activate spinfer

3. Install `SpInfer`.

The libSpMM_API.so and SpMM_API.cuh will be available for easy integration after:

cd $SpInfer_HOME/build && make -j

4. Running SpInfer in kernel benchmark (Figure 10).

Build Sputnik.

cd $SpInfer_HOME/third_party/
source build_sputnik.sh

Build SparTA.

cd $SpInfer_HOME/third_party/
source preparse_cusparselt.sh

Reproduce Figure 10.

cd $SpInfer_HOME/kernel_benchmark
source test_env
make -j
source benchmark.sh

Check the results in raw csv files and the reproduced Figure10.png (Fig. 10).

5. Running End-to-end model.

5.1 Building

Follow the steps in SpInfer/docs/LLMInferenceExample

Building Faster-Transformer with (SpInfer, Flash-llm or Standard) integration
Downloading & Converting OPT models
Configuration Note: Model_dir is different for SpInfer, Flash-llm and Faster-Transformer.

5.2 Running SpInfer Inference

cd $SpInfer_HOME/third_party/

bash run_1gpu_loop.sh

Check the results (Fig.13/14) in $SpInfer_HOME/third_party/FasterTransformer/OutputFile_1gpu_our_60_inlen64/

Test tensor_para_size=2 using bash run_2gpu_loop.sh

Test tensor_para_size=4 using bash run_4gpu_loop.sh

5.3 Running Flash-llm Inference

cd $FlashLLM_HOME/third_party/

bash run_1gpu_loop.sh

Check the results in $FlashLLM_HOME/third_party/FasterTransformer/OutputFile_1gpu_our_60_inlen64/

Test tensor_para_size=1 using bash run_1gpu_loop.sh

5.4 Running Faster-transformer Inference

cd $FT_HOME/third_party/

bash run_2gpu_loop.sh

Check the results in $FT_HOME/FasterTransformer/OutputFile_2gpu_our_60_inlen64/

5.5 Runing DeepSpeed Inference

cd $SpInfer_HOME/end2end_inference/ds_scripts

pip install -r requirements.txt

bash run_ds_loop.sh

Check the results in $SpInfer_HOME/end2end_inference/ds_scripts/ds_result/

If you find this work useful, please cite this project and our paper.

@inproceedings{fan2025spinfer,
  title={SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs},
  author={Fan, Ruibo and Yu, Xiangrui and Dong, Peijie and Li, Zeyu and Gong, Gu and Wang, Qiang and Wang, Wei and Chu, Xiaowen},
  booktitle={Proceedings of the Twentieth European Conference on Computer Systems},
  pages={243--260},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpInfer Artifact for EuroSys'25.

1. Clone this project.

2. Environment Setup. (Install via Conda)

3. Install `SpInfer`.

4. Running SpInfer in kernel benchmark (Figure 10).

5. Running End-to-end model.

5.1 Building

5.2 Running SpInfer Inference

5.3 Running Flash-llm Inference

5.4 Running Faster-transformer Inference

5.5 Runing DeepSpeed Inference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
build		build
csrc		csrc
docs		docs
end2end_inference		end2end_inference
kernel_benchmark		kernel_benchmark
third_party		third_party
.gitmodules		.gitmodules
Init_SpInfer.sh		Init_SpInfer.sh
LICENSE		LICENSE
README.md		README.md
spinfer.yml		spinfer.yml

License

HPMLL/SpInfer_EuroSys25

Folders and files

Latest commit

History

Repository files navigation

SpInfer Artifact for EuroSys'25.

1. Clone this project.

2. Environment Setup. (Install via Conda)

3. Install SpInfer.

4. Running SpInfer in kernel benchmark (Figure 10).

5. Running End-to-end model.

5.1 Building

5.2 Running SpInfer Inference

5.3 Running Flash-llm Inference

5.4 Running Faster-transformer Inference

5.5 Runing DeepSpeed Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

3. Install `SpInfer`.

Packages