RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented atthis https URL.
View on arXiv@article{gao2025_2502.13144, title={ RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning }, author={ Hao Gao and Shaoyu Chen and Bo Jiang and Bencheng Liao and Yiang Shi and Xiaoyang Guo and Yuechuan Pu and Haoran Yin and Xiangyu Li and Xinbang Zhang and Ying Zhang and Wenyu Liu and Qian Zhang and Xinggang Wang }, journal={arXiv preprint arXiv:2502.13144}, year={ 2025 } }