Thinking Outside the Template with Modular GP-GOMEA

2 May 2025

Abstract

The goal in Symbolic Regression (SR) is to discover expressions that accurately map input to output data. Because often the intent is to understand these expressions, there is a trade-off between accuracy and the interpretability of expressions. GP-GOMEA excels at producing small SR expressions (increasing the potential for interpretability) with high accuracy, but requires a fixed tree template, which limits the types of expressions that can be evolved. This paper presents a modular representation for GP-GOMEA that allows multiple trees to be evolved simultaneously that can be used as (functional) subexpressions. While each tree individually is constrained to a (small) fixed tree template, the final expression, if expanded, can exhibit a much larger structure. Furthermore, the use of subexpressions decomposes the original regression problem and opens the possibility for enhanced interpretability through the piece-wise understanding of small subexpressions. We compare the performance of GP-GOMEA with and without modular templates on a variety of datasets. We find that our proposed approach generally outperforms single-template GP-GOMEA and can moreover uncover ground-truth expressions underlying synthetic datasets with modular subexpressions at a faster rate than GP-GOMEA without modular subexpressions.

View on arXiv

@article{harrison2025_2505.01262,
  title={ Thinking Outside the Template with Modular GP-GOMEA },
  author={ Joe Harrison and Peter A.N. Bosman and Tanja Alderliesten },
  journal={arXiv preprint arXiv:2505.01262},
  year={ 2025 }
}

Comments on this paper