Decoupled Diffusion Sparks Adaptive Scene Generation

Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, and Hongyang Li

in Proceedings of IEEE/CVF ICCV 2025

Controllable scene generation could reduce the cost of diverse data collection substantially for autonomous driving.
Prior works formulate the traffic layout generation as a predictive progress, either by denoising entire sequences at once or by iteratively predicting the next frame.
However, full sequence denoising hinders online reaction, while the latter's short-sighted next-frame prediction lacks precise goal-state guidance.
Further, the learned model struggles to generate complex or challenging scenarios due to a large number of safe and ordinary driving behaviors from open datasets.
To overcome these, we introduce Nexus, a decoupled scene generation framework that improves reactivity and goal conditioning by simulating both ordinal and challenging scenarios from fine-grained tokens with independent noise states.
At the core of the decoupled pipeline is the integration of a partial noise-masking training strategy and a noise-aware schedule that ensures timely environmental updates throughout the denoising process.
To complement challenging scenario generation, we collect a dataset consisting of complex corner cases. It covers 540 hours of simulated data, including high-risk interactions such as cut-in, sudden braking, and collision.
Nexus achieves superior generation realism while preserving reactivity and goal orientation, with a 40% reduction in displacement error.
We further demonstrate that Nexus improves closed-loop planning by 20% through data augmentation and showcase its capability in safety-critical data generation.


Page View: 16