33

ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments

Aayush Adhikari
Sandesh Bhatta
Harendra S. Jangwan
Amit Mishra
Khair Ul Nisa
Abu Taha Zamani
Aaron Sapkota
Debendra Muduli
Nikhat Parveen
Main:20 Pages
8 Figures
Bibliography:2 Pages
Abstract

High dimensionality in datasets produced by microarray technology presents a challenge forMachine Learning (ML) algorithms, particularly in terms of dimensionality reduction andhandling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybridensemble feature selection techniques with majority voting classifier for micro array classi fication. Here we have considered both filter and wrapper-based feature selection techniquesincluding Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least AbsoluteShrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and RecursiveFeature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting theoptimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifierthat combines multiple machine learning models, such as Logistic Regression (LR), RandomForest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance andaccuracy. By leveraging the strengths of each model, the ensemble approach aims to providemore reliable and effective diagnostic predictions. The efficacy of the proposed model hasbeen tested in both local and cloud environments. In the cloud environment, three virtualmachines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been usedto demonstrate the model performance. From the experiment it has been observed that, virtualCentral Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%,97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed LineageLeukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian,andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloudenvironments.

View on arXiv
Comments on this paper