Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Learning effective visual representations enables agents to extract meaningful information from raw sensory inputs, which is essential for generalizing across different tasks. However, evaluating representation learning separately from policy learning remains a challenge with most reinforcement learning (RL) benchmarks. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that reimagines the classic 8-tile puzzle with a visual observation space of images sourced from arbitrarily large datasets. SPGym provides precise control over representation complexity through visual diversity, allowing researchers to systematically scale the representation learning challenge while maintaining consistent environment dynamics. Despite the apparent simplicity of the task, our experiments with both model-free and model-based RL algorithms reveal fundamental limitations in current methods. As we increase visual diversity by expanding the pool of possible images, all tested algorithms show significant performance degradation, with even state-of-the-art methods struggling to generalize across different visual inputs while maintaining consistent puzzle-solving capabilities. These results highlight critical gaps in visual representation learning for RL and provide clear directions for improving robustness and generalization in decision-making systems.
View on arXiv@article{oliveira2025_2410.14038, title={ Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning }, author={ Bryan L. M. de Oliveira and Murilo L. da Luz and Bruno Brandão and Luana G. B. Martins and Telma W. de L. Soares and Luckeciano C. Melo }, journal={arXiv preprint arXiv:2410.14038}, year={ 2025 } }