Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

End-to-end (E2E) autonomous driving (AD) models require diverse, high-quality data to perform well across various driving scenarios. However, collecting large-scale real-world data is expensive and time-consuming, making high-fidelity synthetic data essential for enhancing data diversity and model robustness. Existing driving simulators for synthetic data generation have significant limitations: game-engine-based simulators struggle to produce realistic sensor data, while NeRF-based and diffusion-based methods face efficiency challenges. Additionally, recent simulators designed for closed-loop evaluation provide limited interaction with other vehicles, failing to simulate complex real-world traffic dynamics. To address these issues, we introduce SceneCrafter, a realistic, interactive, and efficient AD simulator based on 3D Gaussian Splatting (3DGS). SceneCrafter not only efficiently generates realistic driving logs across diverse traffic scenarios but also enables robust closed-loop evaluation of end-to-end models. Experimental results demonstrate that SceneCrafter serves as both a reliable evaluation platform and a efficient data generator that significantly improves end-to-end model generalization.
View on arXiv@article{ge2025_2503.18108, title={ Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving }, author={ Junhao Ge and Zuhong Liu and Longteng Fan and Yifan Jiang and Jiaqi Su and Yiming Li and Zhejun Zhang and Siheng Chen }, journal={arXiv preprint arXiv:2503.18108}, year={ 2025 } }