Optimal Estimator for Linear Regression with Shuffled Labels

This paper considers the task of linear regression with shuffled labels, i.e., , where , and , respectively, represent the sensing results, (unknown or missing) corresponding information, sensing matrix, signal of interest, and additive sensing noise. Given the observation and sensing matrix , we propose a one-step estimator to reconstruct . From the computational perspective, our estimator's complexity is , which is no greater than the maximum complexity of a linear assignment algorithm (e.g., ) and a least square algorithm (e.g., ). From the statistical perspective, we divide the minimum requirement into four regimes, e.g., unknown, hard, medium, and easy regimes; and present sufficient conditions for the correct permutation recovery under each regime: in the easy regime; in the medium regime; and in the hard regime ( are some positive constants and denotes the stable rank of ). In the end, we also provide numerical experiments to confirm the above claims.
View on arXiv