32GB VRAM Docker setup, E2E tests, reference preencode/upload, streaming support by konovalov-nk · Pull Request #1193 · fishaudio/fish-speech

konovalov-nk · 2026-03-15T04:35:23Z

Streaming works on this branch — low TTFA (~400ms with torch.compile flag), chunks as they're ready. Docker + scripts for ~32GB GPUs (e.g. RTX 5090), so anyone can try it. Draft, don't merge 🤣

Branch: konovalov-nk/fish-speech@feature/32gb-docker-e2e
Docs: docs/docker-32gb-rtx5090.md

Minimal run: From repo root: make run-server (start API in Docker), then make e2e (smoke test). make help for all targets.

scripts/run_server_32gb.sh, WORKSPACE_DIR for nested repo
KV cache / memory: clear_caches(), FISH_CACHE_MAX_SEQ_LEN, /v1/debug/memory
E2E: scripts/e2e_smoke.sh, e2e_memory.sh; uses reference_id from server when available
References: preencode, upload_references.sh, POST /v1/references/add_encoded (skip if hash matches)
Makefile: run-server, e2e, preencode, upload-references, test

What's left to do:

Stream tokens into vocoder with a schedule (per lengyue), not one big chunk.
Cut memory use more and improve TTFA (profile, smaller first chunk, CUDA graphs).
Support longer prompts (~30–50 words) for agent TTS without OOM.

- Docker: run_server_32gb.sh, WORKSPACE_DIR for nested repo, docs/docker-32gb-rtx5090.md - Memory: clear_caches() after request, FISH_CACHE_MAX_SEQ_LEN/MAX_NEW_TOKENS_CAP, /v1/debug/memory - E2E: scripts/e2e_smoke.sh, e2e_memory.sh; use reference_id from server when available - References: preencode (scripts/preencode.sh), upload (upload_references.sh), add_encoded API, hash skip - Makefile: run-server, e2e, e2e-memory, preencode, preencode-upload, upload-references, test - Default voice refs dir: data/voice_references - .gitignore: memory_metrics.jsonl, .pytest_cache, memory_snapshot_*.pickle - Fix: views non-streaming TTS (engine.inference), add_reference finally indentation Made-with: Cursor

for more information, see https://pre-commit.ci

…TFA metrics, inductor warning filter, warmup logs - run_server_32gb.sh: --entrypoint for huggingface-cli download (avoid start_webui.sh/uv in container) - e2e_smoke.sh: curl --compressed for JSON endpoints; jq parse fallback; ttfa_smoke.py for streaming + oneshot TTFA/total_s - ttfa_smoke.py: one TTS request with timing (ttfa_s, ttfa_audio_s, total_s), supports --oneshot - api_server.py: filter UserWarning from torch._inductor (Logical operators and/or deprecated) - model_manager.py: clear warmup logs (torch.compile enabled / warmup finished) so compile progress is visible - Makefile: pass COMPILE to run-server, remove e2e-compile target, update help Made-with: Cursor

for more information, see https://pre-commit.ci

konovalov-nk and others added 2 commits March 15, 2026 05:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

ec80cd3

for more information, see https://pre-commit.ci

konovalov-nk changed the title ~~32GB VRAM Docker setup, E2E tests, reference preencode/upload~~ 32GB VRAM Docker setup, E2E tests, reference preencode/upload, streaming support Mar 15, 2026

konovalov-nk and others added 2 commits March 15, 2026 08:04

[pre-commit.ci] auto fixes from pre-commit.com hooks

e2157d6

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

32GB VRAM Docker setup, E2E tests, reference preencode/upload, streaming support#1193

32GB VRAM Docker setup, E2E tests, reference preencode/upload, streaming support#1193
konovalov-nk wants to merge 4 commits intofishaudio:mainfrom
konovalov-nk:feature/32gb-docker-e2e

konovalov-nk commented Mar 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konovalov-nk commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konovalov-nk commented Mar 15, 2026 •

edited

Loading