47
0

Parsing the Language of Expression: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors

Abstract

Symbolic regression is essential for deriving interpretable expressions that elucidate complex phenomena by exposing the underlying mathematical and physical relationships in data. In this paper, we present an advanced symbolic regression method that integrates symbol priors from diverse scientific domains - including physics, biology, chemistry, and engineering - into the regression process. By systematically analyzing domain-specific expressions, we derive probability distributions of symbols to guide expression generation. We propose novel tree-structured recurrent neural networks (RNNs) that leverage these symbol priors, enabling domain knowledge to steer the learning process. Additionally, we introduce a hierarchical tree structure for representing expressions, where unary and binary operators are organized to facilitate more efficient learning. To further accelerate training, we compile characteristic expression blocks from each domain and include them in the operator dictionary, providing relevant building blocks. Experimental results demonstrate that leveraging symbol priors significantly enhances the performance of symbolic regression, resulting in faster convergence and higher accuracy.

View on arXiv
@article{huang2025_2503.09592,
  title={ Parsing the Language of Expression: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors },
  author={ Sikai Huang and Yixin Berry Wen and Tara Adusumilli and Kusum Choudhary and Haizhao Yang },
  journal={arXiv preprint arXiv:2503.09592},
  year={ 2025 }
}
Comments on this paper