122
v1v2 (latest)

Feature Selection and Junta Testing are Statistically Equivalent

Appendix:32 Pages
Abstract

For a function f ⁣:{0,1}n{0,1}f \colon \{0,1\}^n \to \{0,1\}, the junta testing problem asks whether ff depends on only kk variables. If ff depends on only kk variables, the feature selection problem asks to find those variables. We prove that these two tasks are statistically equivalent. Specifically, we show that the ``brute-force'' algorithm, which checks for any set of kk variables consistent with the sample, is simultaneously sample-optimal for both problems, and the optimal sample size is \[ \Theta\left(\frac 1 \varepsilon \left( \sqrt{2^k \log {n \choose k}} + \log {n \choose k}\right)\right). \]

View on arXiv
Comments on this paper