Hello, thanks for the great work and for sharing the impressive results!
I have a question regarding the reported performance of 17 FPS on H100. Could you please clarify how this FPS metric was calculated? Does this measurement include the time for VAE decoding? And how about the inference time of diffusion and vae on 5s, 81 frames video?
Understanding the exact composition of this benchmark would be very helpful for accurate comparisons and reproducibility. Thank you!