OpenAI API server for OpenVino - when OVMS is too big
-
Updated
Jun 1, 2026 - Python
OpenAI API server for OpenVino - when OVMS is too big
Field-tested guide: multi-GPU vLLM tensor-parallel (TP=2/TP=4) on Intel Arc Pro B70 (Battlemage BMG-G31, Xe2) on Linux. Driver setup (xe force_probe=e223), bare-metal vLLM + oneAPI 2025.3, the compute-runtime multi-root USM + triton-xpu init_devices fixes, FP8/int4-AutoRound quant, root-cause error reports. AI-agent readable (AGENTS.md).
Pioneer documentation: FluidX3D 3.6 LBM solver verified at 99.5% peak bandwidth on Intel Arc Pro B70 (Battlemage Xe2). 4x faster than RTX 3060 Ti. Patches: HiDPI font, windowed mode, FORCE_FIELD with VTK+CSV solid-boundary forces, _exit(0) workaround for xe-driver shutdown race. Companion to OpenFOAM/PETSc/Ginkgo/Paraview B70 sister repos.
Add a description, image, and links to the intel-arc-pro topic page so that developers can more easily learn about it.
To associate your repository with the intel-arc-pro topic, visit your repo's landing page and select "manage topics."