152

SymMatika: Structure-Aware Symbolic Discovery

Main:9 Pages
5 Figures
Bibliography:3 Pages
Appendix:3 Pages
Abstract

Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., y=f(x)y = f(\mathbf{x})) or discovering implicit relations (i.e., F(x,y)=0F(\mathbf{x}, y)=0), few modern and accessible frameworks support both. Moreover, most approaches treat each expression candidate in isolation, without reusing recurring structural patterns that could accelerate search. We introduce SymMatika, a hybrid SR algorithm that combines multi-island genetic programming (GP) with a reusable motif library inspired by biological sequence analysis. SymMatika identifies high-impact substructures in top-performing candidates and reintroduces them to guide future generations. Additionally, it incorporates a feedback-driven evolutionary engine and supports both explicit and implicit relation discovery using implicit-derivative metrics. Across benchmarks, SymMatika achieves state-of-the-art recovery rates, achieving 5.1% higher performance than the previous best results on Nguyen, the first recovery of Nguyen-12, and competitive performance on the Feynman equations. It also recovers implicit physical laws from Eureqa datasets up to 100×100\times faster. Our results demonstrate the power of structure-aware evolutionary search for scientific discovery. To support broader research in interpretable modeling and symbolic discovery, we have open-sourced the full SymMatika framework.

View on arXiv
Comments on this paper