Simple Simultaneous Ensemble Learning in Genetic Programming

13 September 2020

Abstract

Learning ensembles can substantially improve the generalization performance of low-bias high-variance estimators such as deep decision trees and deep nets. Improvements have also been found when Genetic Programming (GP) is used to learn the estimators. Yet, the best way to learn ensembles in GP remains to be determined, especially considering that the population of GP can be exploited to learn ensemble members simultaneously. Some existing works lack any exploitation of the population, i.e., they evolve each ensemble member separately, thus resulting simple but expensive. Other works achieve simultaneous ensemble learning by means of rather involved mechanisms, thus being efficient but also potentially hard to adopt in practice (e.g., due to the interplay of several hyper-parameters). This work attempts to fill the gap between existing works by proposing a new GP algorithm that is both simple but performant, named Simple Simultaneous Ensemble Genetic Programming (2SEGP). 2SEGP is obtained by minor modifications to fitness evaluation and selection of a classic GP algorithm, and its only drawback is an (arguably small) increase of the fitness evaluation cost from the classic $\mathcal{O}(n \ell)$ to $\mathcal{O}(n(\ell + \beta))$ , with $n$ the number of observations and $\ell$ / $\beta$ the estimator/ensemble size. Experimental comparisons on 9 datasets between supervised classification and regression show that, despite its simplicity, 2SEGP fares very competitively with state-of-the-art (ensemble and not) GP algorithms. Because our algorithm is simple, efficient and effective, we believe it can be of interest for the community as well as for practitioners.

View on arXiv

Comments on this paper