vLLM

All

26 repositories

vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda inference pytorch transformer openai moe llama gpt model-serving
Python
•
Apache License 2.0
•11k•60k•1.9k•1.2k•Updated Oct 17, 2025Oct 17, 2025
vllm-spyre
Public
Community maintained hardware plugin for vLLM on Spyre
Python
•
Apache License 2.0
•26•35•5•15•Updated Oct 17, 2025Oct 17, 2025
guidellm
Public
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Python
•
Apache License 2.0
•89•645•86•28•Updated Oct 17, 2025Oct 17, 2025
ci-infra
Public
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
HCL
•42•23•1•19•Updated Oct 17, 2025Oct 17, 2025
tpu-inference
Public
TPU inference for vLLM, with unified JAX and PyTorch support.
Python
•
Apache License 2.0
•7•89•6•30•Updated Oct 17, 2025Oct 17, 2025
vllm-gaudi
Public
Community maintained hardware plugin for vLLM on Intel Gaudi
Python
•
Apache License 2.0
•51•12•1•54•Updated Oct 17, 2025Oct 17, 2025
speculators
Public
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python
•
Apache License 2.0
•11•60•4•17•Updated Oct 17, 2025Oct 17, 2025
llm-compressor
Public
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
sparsity compression quantization
Python
•
Apache License 2.0
•260•2.1k•60•41•Updated Oct 17, 2025Oct 17, 2025
flash-attention
Public
Fast and memory-efficient exact attention
Python
•
BSD 3-Clause "New" or "Revised" License
•2.1k•96•0•17•Updated Oct 17, 2025Oct 17, 2025
vllm-ascend
Public
Community maintained hardware plugin for vLLM on Ascend
inference transformer model-serving mlops ascend llm llmops llm-serving vllm
Python
•
Apache License 2.0
•491•1.2k•572•182•Updated Oct 17, 2025Oct 17, 2025
semantic-router
Public
Intelligent Mixture-of-Models Router for Efficient LLM Inference
python kubernetes rust golang mcp fine-tuning envoyproxy pii-detection mixture-of-models huggingface-transformers
Go
•
Apache License 2.0
•235•1.9k•79•20•Updated Oct 17, 2025Oct 17, 2025
compressed-tensors
Public
A safetensors extension to efficiently store sparse quantized tensors on disk
Python
•
Apache License 2.0
•33•172•5•14•Updated Oct 17, 2025Oct 17, 2025
aibrix
Public
Cost-efficient and pluggable Infrastructure components for GenAI inference
Go
•
Apache License 2.0
•468•4.3k•231•22•Updated Oct 17, 2025Oct 17, 2025
recipes
Public
Common recipes to run vLLM
Jupyter Notebook
•
Apache License 2.0
•58•169•4•4•Updated Oct 17, 2025Oct 17, 2025
vllm-project.github.io
Public
HTML
•32•20•0•1•Updated Oct 16, 2025Oct 16, 2025
FlashMLA
Public
C++
•
MIT License
•884•6•0•3•Updated Oct 16, 2025Oct 16, 2025
production-stack
Public
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Python
•
Apache License 2.0
•308•1.9k•85•57•Updated Oct 13, 2025Oct 13, 2025
vllm-xpu-kernels
Public
The vLLM XPU kernels for Intel GPU
C++
•
Apache License 2.0
•14•9•0•6•Updated Oct 13, 2025Oct 13, 2025
vllm-neuron
Public
Community maintained hardware plugin for vLLM on AWS Neuron
Python
•
Apache License 2.0
•0•10•0•0•Updated Oct 1, 2025Oct 1, 2025
DeepGEMM
Public
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda
•
MIT License
•717•0•0•0•Updated Sep 29, 2025Sep 29, 2025
vllm-openvino
Public
Python
•
Apache License 2.0
•7•23•2•0•Updated Aug 18, 2025Aug 18, 2025
rfcs
Public
0•1•0•0•Updated Jun 3, 2025Jun 3, 2025
vllm-project.github.io-static
Public archive
HTML
•
MIT License
•7•8•0•1•Updated Feb 7, 2025Feb 7, 2025
media-kit
Public
vLLM Logo Assets
3•6•0•1•Updated Dec 12, 2024Dec 12, 2024
vllm-nccl
Public archive
Manages vllm-nccl dependency
Python
•
Apache License 2.0
•3•17•2•0•Updated Jun 3, 2024Jun 3, 2024
dashboard
Public
vLLM performance dashboard
Python
•
Apache License 2.0
•7•37•0•0•Updated Apr 26, 2024Apr 26, 2024