We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes i.i.d. samples where , but the maximizing index is unobserved. Here, the are assumed to be and the noise distribution is centered and independent of . We provide a novel and near optimally sample-efficient (in terms of ) algorithm to recover up to additive -error with polynomial sample complexity and significantly improved time complexity . When , our algorithm runs in time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for and when it is known that . Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small , and leads to improved algorithms for any by providing a warm start for existing local convergence methods.
View on arXiv