This is the official repository of StateSpaceDiffuser: Bringing Long Context to Diffusion World Models, NeurIPS'25.
Authors: Nedko Savov, Naser Kazemi, Deheng Zhang, Danda Pani Paudel, Xi Wang, Luc Van Gool
StateSpaceDiffuser merges a state-space model for memory with a diffusion model for detailed visuals, allowing it to produce stable long-term video predictions. It tackles the challenge of drift in world models over extended rollouts. The approach achieves temporally consistent visual generation across both 2D and 3D interactive environments, significantly outperforming diffusion-only baseline in long-horizon tasks. Expect the code here soon!