Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      11k60k1.9k1.2kUpdated Oct 17, 2025Oct 17, 2025
    • Community maintained hardware plugin for vLLM on Spyre
      Python
      2635515Updated Oct 17, 2025Oct 17, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      896458628Updated Oct 17, 2025Oct 17, 2025
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      4223119Updated Oct 17, 2025Oct 17, 2025
    • TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      789630Updated Oct 17, 2025Oct 17, 2025
    • Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      5112154Updated Oct 17, 2025Oct 17, 2025
    • A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      1160417Updated Oct 17, 2025Oct 17, 2025
    • Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      2602.1k6041Updated Oct 17, 2025Oct 17, 2025
    • Fast and memory-efficient exact attention
      Python
      2.1k96017Updated Oct 17, 2025Oct 17, 2025
    • Community maintained hardware plugin for vLLM on Ascend
      Python
      4911.2k572182Updated Oct 17, 2025Oct 17, 2025
    • Intelligent Mixture-of-Models Router for Efficient LLM Inference
      Go
      2351.9k7920Updated Oct 17, 2025Oct 17, 2025
    • A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      33172514Updated Oct 17, 2025Oct 17, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      4684.3k23122Updated Oct 17, 2025Oct 17, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      5816944Updated Oct 17, 2025Oct 17, 2025
    • HTML
      322001Updated Oct 16, 2025Oct 16, 2025
    • FlashMLA

      Public
      C++
      884603Updated Oct 16, 2025Oct 16, 2025
    • vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3081.9k8557Updated Oct 13, 2025Oct 13, 2025
    • The vLLM XPU kernels for Intel GPU
      C++
      14906Updated Oct 13, 2025Oct 13, 2025
    • Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      01000Updated Oct 1, 2025Oct 1, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      717000Updated Sep 29, 2025Sep 29, 2025
    • Python
      72320Updated Aug 18, 2025Aug 18, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • HTML
      7801Updated Feb 7, 2025Feb 7, 2025
    • media-kit

      Public
      vLLM Logo Assets
      3601Updated Dec 12, 2024Dec 12, 2024
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024
    • dashboard

      Public
      vLLM performance dashboard
      Python
      73700Updated Apr 26, 2024Apr 26, 2024