32GB VRAM Docker setup, E2E tests, reference preencode/upload, streaming support#1193
Draft
konovalov-nk wants to merge 4 commits intofishaudio:mainfrom
Draft
32GB VRAM Docker setup, E2E tests, reference preencode/upload, streaming support#1193konovalov-nk wants to merge 4 commits intofishaudio:mainfrom
konovalov-nk wants to merge 4 commits intofishaudio:mainfrom
Conversation
- Docker: run_server_32gb.sh, WORKSPACE_DIR for nested repo, docs/docker-32gb-rtx5090.md - Memory: clear_caches() after request, FISH_CACHE_MAX_SEQ_LEN/MAX_NEW_TOKENS_CAP, /v1/debug/memory - E2E: scripts/e2e_smoke.sh, e2e_memory.sh; use reference_id from server when available - References: preencode (scripts/preencode.sh), upload (upload_references.sh), add_encoded API, hash skip - Makefile: run-server, e2e, e2e-memory, preencode, preencode-upload, upload-references, test - Default voice refs dir: data/voice_references - .gitignore: memory_metrics.jsonl, .pytest_cache, memory_snapshot_*.pickle - Fix: views non-streaming TTS (engine.inference), add_reference finally indentation Made-with: Cursor
for more information, see https://pre-commit.ci
…TFA metrics, inductor warning filter, warmup logs - run_server_32gb.sh: --entrypoint for huggingface-cli download (avoid start_webui.sh/uv in container) - e2e_smoke.sh: curl --compressed for JSON endpoints; jq parse fallback; ttfa_smoke.py for streaming + oneshot TTFA/total_s - ttfa_smoke.py: one TTS request with timing (ttfa_s, ttfa_audio_s, total_s), supports --oneshot - api_server.py: filter UserWarning from torch._inductor (Logical operators and/or deprecated) - model_manager.py: clear warmup logs (torch.compile enabled / warmup finished) so compile progress is visible - Makefile: pass COMPILE to run-server, remove e2e-compile target, update help Made-with: Cursor
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Streaming works on this branch — low TTFA (~400ms with
torch.compileflag), chunks as they're ready. Docker + scripts for ~32GB GPUs (e.g. RTX 5090), so anyone can try it. Draft, don't merge 🤣Branch: konovalov-nk/fish-speech@feature/32gb-docker-e2e
Docs:
docs/docker-32gb-rtx5090.mdMinimal run: From repo root:
make run-server(start API in Docker), thenmake e2e(smoke test).make helpfor all targets.scripts/run_server_32gb.sh,WORKSPACE_DIRfor nested repoclear_caches(),FISH_CACHE_MAX_SEQ_LEN,/v1/debug/memoryscripts/e2e_smoke.sh,e2e_memory.sh; usesreference_idfrom server when availableupload_references.sh,POST /v1/references/add_encoded(skip if hash matches)What's left to do: