This repository hosts community-maintained inference backends for GPUStack.
It is designed to provide a clear, lightweight, and open way for inference engine developers and advanced users to integrate their backends with GPUStack, without requiring changes to GPUStack core.
If you have an inference engine that already works in Docker and serves models reliably, you are welcome here.
The inference ecosystem evolves quickly across models, hardware platforms, and runtimes. No single project can maintain first-class support for every inference engine.
GPUStack positions itself as a high-performance and flexible Model-as-a-Service (MaaS) platform. This repository exists to:
- Embrace diversity in inference engines
- Lower the integration barrier for new backends
- Clearly separate core responsibilities from community ownership
- Allow proven community backends to grow naturally through real usage
A Community Inference Backend is:
- Integrated with GPUStack through a standardized spec
- Maintained by the community or upstream project
- Explicitly labeled as Community, not Built-in
- Free to evolve independently of GPUStack core releases
GPUStack does not guarantee:
- Runtime correctness
- Model compatibility
- Inference quality or performance
The following community backends are currently available:
-
LLM
-
Vision / OCR
-
Embedding / Reranker / Sequence Classification
-
Audio
- Text-to-Speech
Each backend lives in its own directory and should include:
-
spec.yamlBackend metadata and definition (name, versions, parameters, etc.) Refer tospec-template.yamlfor an example. -
README.mdUsage instructions, supported models, configuration, and limitations -
logo.pngBackend logo for display in the GPUStack inference backends catalog
Additional files may be included if required by the backend.
You are encouraged to contribute if you are:
- An inference engine developer or advanced user
- Running a backend successfully in real environments
- Experimenting with new hardware, accelerators, or runtimes
- Interested in sharing a useful backend with the community
Before submitting a backend, make sure:
- It can run in a container and serve APIs
- You know how to start it with a command
- It exposes a health or API endpoint GPUStack can check
- You understand its model compatibility at a basic level
That’s enough to get started.
- Fork this repository
- Create a new directory for your backend
- Add
spec.yaml,README.mdandlogo.png(Refer tospec-template.yamlfor an example) - Open a Pull Request
In your PR description, briefly explain:
- What problem this backend solves
- Who maintains it
- Whether it is used in real environments
By contributing a backend, you agree to:
- Own its long-term maintenance
- Fix obvious breakages when possible
- Update documentation if behavior changes
Backends that become unmaintained may be marked accordingly or removed.
- Built-in backends are maintained by GPUStack core
- Community backends live in this repository and are community-owned
Outstanding community backends — proven through real usage, adoption, and close integration — may be promoted to Built-in backends in the future.
This repository helps GPUStack remain:
- Neutral and extensible
- Open to innovation
- Scalable without overloading core maintainers
If you build a useful inference backend, this is where it can naturally connect with GPUStack users and grow.
We look forward to your contributions.