Skip to content

zengqunzhao/Relax-Forcing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation

Zengqun Zhao Β· Yanzuo Lu Β· Ziquan Liu Β· Jifei Song Β· Jiankang Deng Β· Ioannis Patras

Queen Mary University of London Β· Imperial College London

arXiv Project Page


πŸ’‘ TL;DR

Relax Forcing improves long-horizon autoregressive video generation by replacing dense full-history attention with a structured memory design: Sink for stability, Tail for short-term continuity, and selected History for motion guidance. This yields better temporal dynamics and consistency on VBench-Long while reducing attention overhead and improving scalability.

πŸ“– Abstract

Autoregressive (AR) video diffusion has recently emerged as a promising paradigm for long video generation, enabling causal synthesis beyond the limits of bidirectional models. To address training–inference mismatch, a series of self-forcing strategies have been proposed to improve rollout stability by conditioning the model on its own predictions during training. While these approaches substantially mitigate exposure bias, extending generation to minute-scale horizons remains challenging due to progressive temporal degradation.

In this work, we show that this limitation is not primarily caused by insufficient memory, but by how temporal memory is utilised during inference. Through empirical analysis, we find that increasing memory does not consistently improve long-horizon generation, and that the temporal placement of historical context significantly influences motion dynamics while leaving visual quality largely unchanged.

Motivated by this insight, we introduce Relax Forcing, a structured temporal memory mechanism for AR diffusion. Instead of attending to the dense generated history, Relax Forcing decomposes temporal context into three functional roles:

  • 🟒 Sink β€” global stability anchors
  • 🟣 History β€” dynamically selected intermediate motion structure
  • πŸ”΅ Tail β€” recent short-term continuity

This design mitigates error accumulation during extrapolation while preserving motion evolution. Experiments on VBench-Long demonstrate that Relax Forcing improves motion dynamics and overall temporal consistency while reducing attention overhead.

πŸ” Method

Method Overview

Figure 1. Overview of Relaxed KV Memory. Instead of retaining dense chronological history, temporal memory is decomposed into three functional components: Sink for global anchors, History for intermediate motion structure, and Tail for recent continuity. During generation, candidate historical frames are dynamically selected to remain aligned with Sink while avoiding redundancy with Tail. The selected memory is then integrated through a relaxed KV formulation with adjusted relative positional encoding, enabling the model to leverage non-contiguous temporal context while preserving long-range consistency during autoregressive rollout.

πŸ“ˆ Quantitative Results

Quantitative Comparison on VBench-Long

Relax Forcing achieves state-of-the-art performance on VBench-Long, outperforming all compared methods in Dynamic Degree and Average score at both 30-second and 60-second generation lengths, while maintaining competitive throughput (16.33 FPS).

πŸ—“οΈ Release Progress

  • Paper
  • Code

πŸ”– Citation

If you find this work useful for your research, please consider citing our paper and giving this repo a ⭐️.

@misc{zhao2026relaxforcing,
      title={Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation}, 
      author={Zengqun Zhao and Yanzuo Lu and Ziquan Liu and Jifei Song and Jiankang Deng and Ioannis Patras},
      year={2026},
      eprint={2603.21366},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.21366}, 
}

πŸ™ Acknowledgements

  • Self-Forcing: the foundational codebase and algorithm we built upon. Thanks for their wonderful work.
  • Wan: the base video diffusion model we built upon. Thanks for their wonderful work.

About

[arXiv'26] Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors