Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation

Zengqun Zhao · Yanzuo Lu · Ziquan Liu · Jifei Song · Jiankang Deng · Ioannis Patras

Queen Mary University of London · Imperial College London

💡 TL;DR

Relax Forcing improves long-horizon autoregressive video generation by replacing dense full-history attention with a structured memory design: Sink for stability, Tail for short-term continuity, and selected History for motion guidance. This yields better temporal dynamics and consistency on VBench-Long while reducing attention overhead and improving scalability.

📖 Abstract

Autoregressive (AR) video diffusion has recently emerged as a promising paradigm for long video generation, enabling causal synthesis beyond the limits of bidirectional models. To address training–inference mismatch, a series of self-forcing strategies have been proposed to improve rollout stability by conditioning the model on its own predictions during training. While these approaches substantially mitigate exposure bias, extending generation to minute-scale horizons remains challenging due to progressive temporal degradation.

In this work, we show that this limitation is not primarily caused by insufficient memory, but by how temporal memory is utilised during inference. Through empirical analysis, we find that increasing memory does not consistently improve long-horizon generation, and that the temporal placement of historical context significantly influences motion dynamics while leaving visual quality largely unchanged.

Motivated by this insight, we introduce Relax Forcing, a structured temporal memory mechanism for AR diffusion. Instead of attending to the dense generated history, Relax Forcing decomposes temporal context into three functional roles:

🟢 Sink — global stability anchors
🟣 History — dynamically selected intermediate motion structure
🔵 Tail — recent short-term continuity

This design mitigates error accumulation during extrapolation while preserving motion evolution. Experiments on VBench-Long demonstrate that Relax Forcing improves motion dynamics and overall temporal consistency while reducing attention overhead.

🔍 Method

Figure 1. Overview of Relaxed KV Memory. Instead of retaining dense chronological history, temporal memory is decomposed into three functional components: Sink for global anchors, History for intermediate motion structure, and Tail for recent continuity. During generation, candidate historical frames are dynamically selected to remain aligned with Sink while avoiding redundancy with Tail. The selected memory is then integrated through a relaxed KV formulation with adjusted relative positional encoding, enabling the model to leverage non-contiguous temporal context while preserving long-range consistency during autoregressive rollout.

📈 Quantitative Results

Relax Forcing achieves state-of-the-art performance on VBench-Long, outperforming all compared methods in Dynamic Degree and Average score at both 30-second and 60-second generation lengths, while maintaining competitive throughput (16.33 FPS).

🗓️ Release Progress

Paper
Code

🔖 Citation

If you find this work useful for your research, please consider citing our paper and giving this repo a ⭐️.

@misc{zhao2026relaxforcing,
      title={Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation}, 
      author={Zengqun Zhao and Yanzuo Lu and Ziquan Liu and Jifei Song and Jiankang Deng and Ioannis Patras},
      year={2026},
      eprint={2603.21366},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.21366}, 
}

🙏 Acknowledgements

Self-Forcing: the foundational codebase and algorithm we built upon. Thanks for their wonderful work.
Wan: the base video diffusion model we built upon. Thanks for their wonderful work.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation

💡 TL;DR

📖 Abstract

🔍 Method

📈 Quantitative Results

🗓️ Release Progress

🔖 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation

💡 TL;DR

📖 Abstract

🔍 Method

📈 Quantitative Results

🗓️ Release Progress

🔖 Citation

🙏 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages