The application of reinforcement learning (RL) to dynamic resource allocation in optical networks has been the focus of intense research activity in recent years, with almost 100 peer-reviewed papers. We present a review of progress in the field, and identify significant gaps in benchmarking practices and reproducibility. To determine the strongest benchmark algorithms, we systematically evaluate several heuristics across diverse network topologies. We find that path count and sort criteria for path selection significantly affect the benchmark performance. We meticulously recreate the problems from five landmark papers and apply the improved benchmarks. Our comparisons demonstrate that simple heuristics consistently match or outperform the published RL solutions, often with an order of magnitude lower blocking probability. Furthermore, we present empirical lower bounds on network blocking using a novel defragmentation-based method, revealing that potential improvements over the benchmark heuristics are limited to 19-36% increased traffic load for the same blocking performance in our examples. We make our simulation framework and results publicly available to promote reproducible research and standardized evaluationthis https URL.
View on arXiv@article{doherty2025_2502.12804, title={ Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope? }, author={ Michael Doherty and Robin Matzner and Rasoul Sadeghi and Polina Bayvel and Alejandra Beghelli }, journal={arXiv preprint arXiv:2502.12804}, year={ 2025 } }