Genetic Algorithms (GAs) are powerful metaheuristics mostly used in research fields. The sequential execution of GAs needs of considerable computational power. Nevertheless, GAs are naturally parallelizable and renting a cluster on Cloud platforms is easy and cheap. One of the most common solutions for parallel applications is Apache Hadoop. What is not simple is developing parallel GAs without facing with the inner workings of Hadoop. Even though some sequential frameworks for GAs already exist, there is no framework supporting the development of GA applications that can be executed in parallel. In this paper is described a framework for parallel GAs so as to be executed on the Hadoop platform, following the paradigm of MapReduce. The main purpose of this framework is to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the cluster with a good performance. The framework has been also exploited to develop an application for Feature Subset Selection problem. A preliminary analysis of the performance of the developed GA application has been performed using three datasets.
View on arXiv