Update inference_cli.py Long videos and low RAM refactor#462
Update inference_cli.py Long videos and low RAM refactor#462
Conversation
Will-I4M
commented
Dec 31, 2025
- Multi-GPU streaming refactor: added --recycle_workers_every and changed dispatch to process per-GPU sub-segments in cycles, respawning workers each cycle to release allocator memory.
- Spill-based stitching: introduced disk spill per chunk and incremental stitcher that blends overlaps while streaming to a persistent writer (or PNGs) without holding full segments in RAM.
- Output controls: ffmpeg writer now supports configurable codec/pix_fmt/bitrate/CRF/preset and higher bit depths; spill files stored in uint8/uint16 based on --output_bitdepth.
- Resilience: added OOM retry with cleanup/backoff for chunk processing; disk saves use atomic writes with retries.
- Diagnostics: RAM telemetry per chunk/phase, per-PID monitors for workers, and detailed logging around chunk processing, saving, and stitching.
- Input handling: explicit freeing of input tensors (new_frames) between streamed chunks to reduce transient RAM.
- CLI additions: --spill_dir, --output_bitdepth, --video_codec, --video_pix_fmt, --video_crf, --video_bitrate, --video_preset, --recycle_workers_every.
- Minor safety: pre-parse CUDA device; auto-enable streaming for long videos when --chunk_size is unset to avoid full loads.
- Multi-GPU streaming refactor: added --recycle_workers_every and changed dispatch to process per-GPU sub-segments in cycles, respawning workers each cycle to release allocator memory. - Spill-based stitching: introduced disk spill per chunk and incremental stitcher that blends overlaps while streaming to a persistent writer (or PNGs) without holding full segments in RAM. - Output controls: ffmpeg writer now supports configurable codec/pix_fmt/bitrate/CRF/preset and higher bit depths; spill files stored in uint8/uint16 based on --output_bitdepth. - Resilience: added OOM retry with cleanup/backoff for chunk processing; disk saves use atomic writes with retries. - Diagnostics: RAM telemetry per chunk/phase, per-PID monitors for workers, and detailed logging around chunk processing, saving, and stitching. - Input handling: explicit freeing of input tensors (new_frames) between streamed chunks to reduce transient RAM. - CLI additions: --spill_dir, --output_bitdepth, --video_codec, --video_pix_fmt, --video_crf, --video_bitrate, --video_preset, --recycle_workers_every. - Minor safety: pre-parse CUDA device; auto-enable streaming for long videos when --chunk_size is unset to avoid full loads.
|
Thank you for the PR @Will-I4M - there is quite a lot of changes. Did you test it with single and multi gpu? Including on long videos? Anything in the code update I need to consider when doing the review? I'll try to get to it next week, I'm unavailable this week. Thanks again. |
Indeed, my apologies for pushing several feature changes all at once. Perhaps it would be wiser to split the inference_cli into two parts to potentially keep some common sections and specialize one version in low RAM usage. In this PR, without using --debug, the code displays several messages about RAM usage, which isn't necessarily desirable for everyone, even though I find it very useful given SeedVR2's high RAM and VRAM requirements. I tested it in single-GPU and multi-GPU configurations. My main focus in this PR was RAM usage. There's generally better resilience to VRAM-related OOMs, but that's not the biggest change. In the initial version, all chunks are kept in RAM after processing until the final step: that's the biggest change, because in this version, only what's necessary (stitching, etc.) is kept in RAM. In the initial version, in multi-GPU mode, if a single process crashed during calculation (out of order...), the others would still continue unnecessarily until the end. The code is now more resilient and retries after somewhat aggressive memory cleanup attempts. I tested several strategies to free up RAM, all of which proved unsuccessful, until this latest version: the system no longer swaps unnecessarily. I was able to process several hours of video using the 7B model (fp16/4k/21 frames of context) on 4x3090 in a reasonable time, something I couldn't do before (even for durations of 25 minutes), so I think this work can be useful to the SeedVR2 user community. |
|
Pardon my ignorance, is this something that would, or could, benefit the ComfyUI version as well? |
|
|
Hi. BTW.. your file has still the bug about "--prepend_frames" not working. |
|
I have the same problem as above with --video_backend ffmpeg hanging. |
|
Ok thank you, I'll have a look on this on the next week.
…On Sat, Jan 17, 2026 at 7:03 PM thehhmdb ***@***.***> wrote:
*thehhmdb* left a comment (numz/ComfyUI-SeedVR2_VideoUpscaler#462)
<#462 (comment)>
I have the same problem as above with --video_backend ffmpeg hanging.
—
Reply to this email directly, view it on GitHub
<#462 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACO6VUXQYPT7T3SP62GNDDT4HJ2OFAVCNFSM6AAAAACQMJLDS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTONRUGE3DMNRSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The FFMPEGVideoWriter.release() method called self.proc.wait() without a timeout, causing the entire process to hang forever if ffmpeg didn't terminate cleanly after stdin was closed. This was reported in PR numz#462 where chunk-streaming with --video_backend ffmpeg would hang on the last chunk without any error message. Changes: - Add 120s timeout to proc.wait() with SIGTERM/SIGKILL escalation - Capture ffmpeg stderr via background thread for diagnostics (previously stderr=DEVNULL swallowed all error output) - Add -loglevel warning to ffmpeg args to minimize stderr volume while still capturing meaningful errors - Include ffmpeg stderr in error messages for write() and release() - Add __del__ safety net to clean up ffmpeg process on GC https://claude.ai/code/session_01Qqj52TTGPz6BPHFGMYqALr