ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.12104
38
16
v1v2 (latest)

High-Dimensional Feature Selection for Genomic Datasets

27 February 2020
M. Afshar
Hamid Usefi
ArXiv (abs)PDFHTML
Abstract

A central problem in machine learning and pattern recognition is the process of recognizing the most important features. In this paper, we provide a new feature selection method (DRPT) that consists of first removing the irrelevant features and then detecting correlations between the remaining features. Let D=[A∣b]D=[A\mid \mathbf{b}]D=[A∣b] be a dataset, where b\mathbf{b}b is the class label and AAA is a matrix whose columns are the features. We solve Ax=bA\mathbf{x} = \mathbf{b}Ax=b using the least squares method and the pseudo-inverse of AAA. Each component of x\mathbf{x}x can be viewed as an assigned weight to the corresponding column (feature). We define a threshold based on the local maxima of x\mathbf{x}x and remove those features whose weights are smaller than the threshold. To detect the correlations in the reduced matrix, which we still call AAA, we consider a perturbation A~\tilde AA~ of AAA. We prove that correlations are encoded in Δx=∣x−x~∣\Delta\mathbf{x}=\mid \mathbf{x} -\tilde{\mathbf{x}}\mid Δx=∣x−x~∣, where x~\tilde{\mathbf{x}}x~ is the least quares solution of A~x~=b\tilde A\tilde{\mathbf{x}}=\mathbf{b}A~x~=b. We cluster features first based on Δx\Delta\mathbf{x}Δx and then using the entropy of features. Finally, a feature is selected from each sub-cluster based on its weight and entropy. The effectiveness of DRPT has been verified by performing a series of comparisons with seven state-of-the-art feature selection methods over ten genetic datasets ranging up from 9,117 to 267,604 features. The results show that, over all, the performance of DRPT is favorable in several aspects compared to each feature selection algorithm. \e

View on arXiv
Comments on this paper