Enhancing Symbolic Regression with Quality-Diversity and Physics-Inspired Constraints

This paper presents QDSR, an advanced symbolic Regression (SR) system that integrates genetic programming (GP), a quality-diversity (QD) algorithm, and a dimensional analysis (DA) engine. Our method focuses on exact symbolic recovery of known expressions from datasets, with a particular emphasis on the Feynman-AI benchmark. On this widely used collection of 117 physics equations, QDSR achieves an exact recovery rate of 91.6~, surpassing all previous SR methods by over 20 percentage points. Our method also exhibits strong robustness to noise. Beyond QD and DA, this high success rate results from a profitable trade-off between vocabulary expressiveness and search space size: we show that significantly expanding the vocabulary with precomputed meaningful variables (e.g., dimensionless combinations and well-chosen scalar products) often reduces equation complexity, ultimately leading to better performance. Ablation studies will also show that QD alone already outperforms the state-of-the-art. This suggests that a simple integration of QD, by projecting individuals onto a QD grid, can significantly boost performance in existing algorithms, without requiring major system overhauls.
View on arXiv@article{bruneton2025_2503.19043, title={ Enhancing Symbolic Regression with Quality-Diversity and Physics-Inspired Constraints }, author={ J.-P. Bruneton }, journal={arXiv preprint arXiv:2503.19043}, year={ 2025 } }