Single GPU inference performance - plans for quantized/optimized version? #147

aosteklov · 2026-05-04T07:37:48Z

aosteklov
May 4, 2026

Hi, first of all great work on Vista4D — really impressive results!

I've been trying to run inference on a single RTX 5090 (32GB VRAM) and
finding it extremely slow due to the 14B Wan model constantly offloading
between VRAM and system RAM (the full model is ~64GB in bfloat16).

A few questions:

Are there plans to release an fp8 or quantized version of the checkpoint
that would fit within 32GB VRAM?
Is there a recommended minimum VRAM for reasonable single-GPU inference
speeds?
Would a ComfyUI integration be something you're considering?

Happy to test any experimental builds. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single GPU inference performance - plans for quantized/optimized version? #147

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Single GPU inference performance - plans for quantized/optimized version? #147

Uh oh!

aosteklov May 4, 2026

Replies: 0 comments

aosteklov
May 4, 2026