You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first of all great work on Vista4D — really impressive results!
I've been trying to run inference on a single RTX 5090 (32GB VRAM) and
finding it extremely slow due to the 14B Wan model constantly offloading
between VRAM and system RAM (the full model is ~64GB in bfloat16).
A few questions:
Are there plans to release an fp8 or quantized version of the checkpoint
that would fit within 32GB VRAM?
Is there a recommended minimum VRAM for reasonable single-GPU inference
speeds?
Would a ComfyUI integration be something you're considering?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, first of all great work on Vista4D — really impressive results!
I've been trying to run inference on a single RTX 5090 (32GB VRAM) and
finding it extremely slow due to the 14B Wan model constantly offloading
between VRAM and system RAM (the full model is ~64GB in bfloat16).
A few questions:
that would fit within 32GB VRAM?
speeds?
Happy to test any experimental builds. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions