Stormvino

OpenAI-compatible LLM server for Intel Arc GPUs. Runs local inference via OpenVINO. Speaks the OpenAI API — drop it behind any client that accepts a base_url. No NVIDIA required.

Hardware compatibility

GPU	VRAM	Status	Notes
Arc B60	24 GB	✅ Production	EnvyStorm reference machine
Arc B50	16 GB	🔜 Testing	TinyB — install in progress
Arc B65	TBD	🔜 Planned	Next after B50 confirmed
Arc B70	TBD	🔜 Planned
Other Arc	any	⚙️ Auto-tuned	VRAM detected at runtime

Detecting B-series cards: Battlemage GPUs often report as Intel(R) Graphics [0xExxx] (e.g. [0xe212]) — not the word "Arc"; lspci and the OpenVINO device name both omit it. Identify the discrete GPU by its OpenVINO device type (DISCRETE vs INTEGRATED), not by matching "Arc". If a detection step reports "no Arc GPU found" on a B-series card, the card is still fine — confirm with clinfo or python -c "import openvino as ov; print(ov.Core().available_devices)" and continue.

OS: Linux Mint 22.x / Ubuntu 24.04 (Noble). Kernel: Battlemage (B-series) needs the xe driver. linux-oem-24.04 provides it — but a newer generic/mainline kernel (6.11+) that already loads xe and creates a /dev/dri/renderD* node for the card works too. The installer checks whether the GPU is already live and upgrades the kernel only if it isn't — so a working newer kernel won't be downgraded. System RAM: 16 GB minimum (a 16 GB machine reports ~15 GiB usable). Disk: 50 GB+ for a useful model set.

Install paths — pick one

🤖 Claude Code (recommended for single machine)

Fully automated. CC asks 3 questions, then handles everything — including a kernel upgrade + reboot only if your GPU isn't already working. You watch.

Step 1 — Install Claude Code if you haven't:

npm install -g @anthropic-ai/claude-code

Prerequisite — passwordless sudo for the install. The automated path runs system commands via sudo, and Claude Code's non-interactive shell can't answer a password prompt. Grant a temporary drop-in and remove it when the install finishes:

echo "$USER ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/stormvino-install
sudo chmod 0440 /etc/sudoers.d/stormvino-install
# when the install is done:  sudo rm /etc/sudoers.d/stormvino-install

Step 2 — Clone the repo into your home dir and start CC there. Don't clone into /opt — it's root-owned, so the clone fails; the runbook creates and owns /opt/ov_server for you during install:

git clone https://github.com/Jermalk/stormvino.git ~/stormvino
cd ~/stormvino
claude

Step 3 — In the CC chat, type exactly:

Run the Stormvino installation runbook. @CC_INSTALL.md

The @CC_INSTALL.md mention loads the runbook directly — no file dragging needed. CC reads it and takes over. Answer the 3 questions it asks, then watch.

→ See CC_INSTALL.md for what CC does at each phase.

⚙️ Ansible (recommended for multiple machines / repeatable deploys)

One command installs on any number of Arc machines simultaneously. Detects GPU VRAM at runtime and tunes config automatically. Fully headless — handles reboots without human intervention.

git clone https://github.com/Jermalk/stormvino.git
cd stormvino
# edit vars/main.yml (3 lines) — then:
ansible-playbook -i hosts.yml stormvino.yml

→ See ANSIBLE.md for the full plan and current implementation status.

📖 Manual (full control, learn every step)

Step-by-step guide with a verification test between every phase. Covers kernel, drivers, Python env, PostgreSQL, models, and systemd services.

git clone https://github.com/Jermalk/stormvino.git
cd stormvino
./install.sh    # detects hardware, routes to the right path

→ See INSTALL.md.

What you get

Endpoint	Description
`POST /v1/chat/completions`	OpenAI-compatible chat, streaming supported
`POST /v1/embeddings`	Sentence embeddings (multilingual-e5-large)
`GET /v1/models`	List discovered models
`POST /v1/images/generations`	Image generation (SDXL, optional)
`POST /v1/audio/transcriptions`	Speech-to-text (Whisper, optional)
`POST /v1/audio/speech`	Text-to-speech (Kokoro / Piper, optional)
`GET /health`	Server health + loaded models + VRAM stats
`GET /monitor`	Web dashboard — live VRAM, throughput, request log

Default port: 11435. Accessible over LAN. Runs as an unprivileged stormvino systemd service (not root); the embedding model is offloaded to the iGPU when present, leaving the Arc's full VRAM for the LLM.

Tested models (B60 / 24 GB VRAM)

Model	VRAM	Role
`qwen3-14b-int4-ov`	9.1 GB	Default — reasoning, coding, chat
`qwen3-8b-int4-ov`	4.6 GB	Agent turns, fast responses
`multilingual-e5-large-int8`	563 MB	Embeddings + task routing
`whisper-large-v3-int8-ov`	~2 GB	Speech-to-text
`qwen2.5-vl-7b-int4-ov`	~5 GB	Vision — image understanding

→ See MODELS.md for conversion instructions and VRAM budget tables.

Quick health check

curl -s http://localhost:11435/health | python3 -m json.tool

curl -s http://localhost:11435/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen3-8b-int4-ov","messages":[{"role":"user","content":"Hello"}]}'

Libraries stack

Inference (server runtime)

Library	Version
openvino	2026.1.0
openvino-genai	2026.1.0.0
openvino-tokenizers	2026.1.0.0
infergate	0.2.0
optimum-intel	1.27.0
optimum	2.1.0
transformers	4.57.6
tokenizers	0.22.2

Model conversion (offline, via optimum-cli)

Library	Version
nncf	3.1.0
onnx	1.21.0
onnxruntime	1.25.0
safetensors	0.7.0
huggingface_hub	0.36.2

Configuration

Runtime settings live in config.json. Key settings auto-patched by the installers based on detected GPU VRAM:

Key	Description
`device`	OpenVINO device — auto-detected (e.g. `GPU.1`)
`kv_cache_size_gb`	KV cache per model — tuned to VRAM tier
`max_loaded_models`	Models held in VRAM simultaneously
`default_model`	Model used when client doesn't specify
`embedding_model`	Embedding model directory name
`postgres_dsn`	Observability database connection string

Full reference: INSTALL.md § Phase 7.

Architecture

Layer	Component
HTTP	FastAPI + Uvicorn, single worker
LLM inference	`openvino_genai.LLMPipeline`, executor-offloaded
VLM inference	`openvino_genai.VLMPipeline`
Embeddings	`OVModelForFeatureExtraction` (optimum-intel)
Task routing	Embedding similarity + signal detection
STT	`openvino_genai.WhisperPipeline`
TTS	Kokoro-ONNX (EN) + Piper (PL)
Observability	PostgreSQL 16 + pgvector
Monitor UI	Svelte + uPlot

Hardware reports welcome

Tested Stormvino on a GPU not in the compatibility table? Open a hardware report issue — GPU model, VRAM, kernel version, tokens/sec. Builds the matrix for everyone.

Origin

Stormvino grew out of Shangri-Lab — a personal lab built by an IT architect from Silesia who had no Python background, a pair of Intel Arc GPUs, and a firm belief that local inference shouldn't require Nvidia hardware or magic frameworks.

The philosophy is unchanged: build the simplest thing that gives full visibility first, tune quality only after you can observe it.

Built with Claude Code.

Name		Name	Last commit message	Last commit date
Latest commit History 333 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
autotest		autotest
catwalk		catwalk
dev		dev
infergate		infergate
monitor		monitor
plugins		plugins
scripts/claude		scripts/claude
tests		tests
.gitignore		.gitignore
ANSIBLE.md		ANSIBLE.md
CC_INSTALL.md		CC_INSTALL.md
CLAUDE.md		CLAUDE.md
INSTALL.md		INSTALL.md
MODELS.md		MODELS.md
Makefile		Makefile
README.md		README.md
admin_routes.py		admin_routes.py
app_state.py		app_state.py
catalogue.py		catalogue.py
chat_handler.py		chat_handler.py
config.envystorm.example.json		config.envystorm.example.json
config.example.json		config.example.json
config.tinyb50.example.json		config.tinyb50.example.json
db.py		db.py
gpu_monitor.py		gpu_monitor.py
image_pipeline.py		image_pipeline.py
install.sh		install.sh
lessons_learned.md		lessons_learned.md
media_routes.py		media_routes.py
model_manager.py		model_manager.py
monitor_sidecar.py		monitor_sidecar.py
news_routes.py		news_routes.py
news_scraper.py		news_scraper.py
ov_server.py		ov_server.py
plugin_runner.py		plugin_runner.py
prompt_builder.py		prompt_builder.py
pytest.ini		pytest.ini
requirements-server.txt		requirements-server.txt
requirements-system-snapshot.txt		requirements-system-snapshot.txt
router.py		router.py
server_config.py		server_config.py
stt_pipeline.py		stt_pipeline.py
tts_pipeline.py		tts_pipeline.py
voice_client.py		voice_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stormvino

Hardware compatibility

Install paths — pick one

🤖 Claude Code (recommended for single machine)

⚙️ Ansible (recommended for multiple machines / repeatable deploys)

📖 Manual (full control, learn every step)

What you get

Tested models (B60 / 24 GB VRAM)

Quick health check

Libraries stack

Configuration

Architecture

Hardware reports welcome

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stormvino

Hardware compatibility

Install paths — pick one

🤖 Claude Code (recommended for single machine)

⚙️ Ansible (recommended for multiple machines / repeatable deploys)

📖 Manual (full control, learn every step)

What you get

Tested models (B60 / 24 GB VRAM)

Quick health check

Libraries stack

Configuration

Architecture

Hardware reports welcome

Origin

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages