llm-inference

Star

Here are 2,020 public repositories matching this topic...

nomic-ai / gpt4all

Star

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

ai-chat llm-inference

Updated May 27, 2025
C++

ray-project / ray

Star

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated May 6, 2026
Python

gitleaks / gitleaks

Star

Find secrets with Gitleaks 🔑

Updated Mar 25, 2026
Go

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

llm llmops llm-serving llm-training llm-inference

Updated Mar 12, 2026
HTML

Lightning-AI / litgpt

Star

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated May 1, 2026
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

Updated Apr 27, 2026
Python

mistralai / mistral-inference

Star

Official inference library for Mistral models

llm llm-inference mistralai

Updated Apr 20, 2026
Jupyter Notebook

openvinotoolkit / openvino

Star

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated May 6, 2026
C++

Tiiny-AI / PowerInfer

Star

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

Updated Jan 24, 2026
C++

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 4, 2026
Python

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Apr 29, 2026
Python

katanemo / plano

Star

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

proxy routing gateway prompt proxy-server openai envoy envoyproxy llms generative-ai llmops llm-inference llm-proxy ai-gateway llm-gateway llm-routing ai-gateway-support

Updated May 5, 2026
Rust

algorithmicsuperintelligence / openevolve

Star

Open-source implementation of AlphaEvolve

genetic-algorithm discovery optimize evolutionary-algorithms deepmind-lab deepmind iterative-methods genetic-algorithms evolutionary-computation alphacode distributed-evolutionary-algorithms iterative-refinement llm-inference llm-engineering llm-ensemble coding-agent alpha-evolve alphaevolve openevolve

Updated Mar 18, 2026
Python

flashinfer-ai / flashinfer

Star

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated May 6, 2026
Python

kserve / kserve

Star

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Updated May 3, 2026
Go

superduper-io / superduper

Star

Superduper: End-to-end framework for building custom AI applications and agents.

Updated Sep 1, 2025
Python

xlite-dev / Awesome-LLM-Inference

Star

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Apr 20, 2026
Python

gpustack / gpustack

Star

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Apr 30, 2026
Python

FellouAI / eko

Star

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Updated Mar 3, 2026
TypeScript

Michael-A-Kuykendall / shimmy

Sponsor

Star

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

rust machine-learning transformers api-server developer-tools llama command-line-tool lora inference-server rust-crate huggingface huggingface-transformers huggingface-models llamacpp llm-inference local-ai gguf ollama-api openai-compatible

Updated Mar 26, 2026
Rust

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-inference

Here are 2,020 public repositories matching this topic...

nomic-ai / gpt4all

ray-project / ray

gitleaks / gitleaks

liguodongiot / llm-action

Lightning-AI / litgpt

bentoml / OpenLLM

mistralai / mistral-inference

openvinotoolkit / openvino

Tiiny-AI / PowerInfer

bentoml / BentoML

InternLM / lmdeploy

katanemo / plano

algorithmicsuperintelligence / openevolve

flashinfer-ai / flashinfer

kserve / kserve

superduper-io / superduper

xlite-dev / Awesome-LLM-Inference

gpustack / gpustack

FellouAI / eko

Michael-A-Kuykendall / shimmy

Improve this page

Add this topic to your repo