Intuition for performing timestep t=0 noise generation without using storing latents for video




In causvid/self-forcing, the concept of performing the t=0 generation after doing `self.denoising_step_list` makes sense as the model is trained off of cleaned frames of the target distribution. This makes it important to match as closely to the original distribution, so this update should be done.
https://github.com/tianweiy/CausVid/blob/master/causvid/models/wan/causal_inference.py#L188
https://github.com/guandeh17/Self-Forcing/blob/main/pipeline/causal_inference.py#L228


However these latents aren't used for anything other than as a prior for generation. If already generated as 'clean frames', why don't we use these for the videos? If the produced video is using priors from frames of timestep t != 0, why do we mismatch the video we see and the priors for future generation?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intuition for performing timestep t=0 noise generation without using storing latents for video #66

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Intuition for performing timestep t=0 noise generation without using storing latents for video #66

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions