FastAPI LLM pool for local and OpenAI-compatible remote inference, with multimodal input, scheduling, replicas, metrics, and admin APIs.
-
Updated
Jun 26, 2026 - Python
FastAPI LLM pool for local and OpenAI-compatible remote inference, with multimodal input, scheduling, replicas, metrics, and admin APIs.
Add a description, image, and links to the exllamav3 topic page so that developers can more easily learn about it.
To associate your repository with the exllamav3 topic, visit your repo's landing page and select "manage topics."