ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1409.7794
145
12
v1v2v3 (latest)

Large-scale Online Feature Selection for Ultra-high Dimensional Sparse Data

27 September 2014
Yue-bo Wu
Guosheng Lin
Tao Mei
Nenghai Yu
ArXiv (abs)PDFHTML
Abstract

Feature selection is an important technique in machine learning and pattern classification, especially when dealing with high-dimensional data. Most existing methods are neither accurate enough nor sufficiently fast when handling large-scale ultra-high dimensional data. To overcome this open challenge, we present a simple but smart second-order online feature selection algorithm that is extremely efficient, scalable to large scale and ultra-high dimensionality, and effective. Unlike conventional methods, the proposed algorithm effectively exploits the second-order information, trying to select the most confident weights while keeping the distribution close to the non-truncated distribution. We conducted extensive experiments by comparing both online and batch feature selection techniques. Our promising results show that our new technique not only outperforms the existing online algorithms, but also achieves highly competitive accuracy as the state-of-the-art batch feature selection methods while consuming orders of magnitude lower computational cost. Impressively, on a billion-scale synthetic dataset (1-billion dimensions, 1-billion nonzero features, and 1-million samples), our algorithm took only eight minutes on a normal single machine.

View on arXiv
Comments on this paper