18
0

PRIMO: Private Regression in Multiple Outcomes

Abstract

We introduce a new private regression setting we call Private Regression in Multiple Outcomes (PRIMO), inspired by the common situation where a data analyst wants to perform a set of ll regressions while preserving privacy, where the features XX are shared across all ll regressions, and each regression i[l]i \in [l] has a different vector of outcomes yiy_i. Naively applying existing private linear regression techniques ll times leads to a l\sqrt{l} multiplicative increase in error over the standard linear regression setting. We apply a variety of techniques including sufficient statistics perturbation (SSP) and geometric projection-based methods to develop scalable algorithms that outperform this baseline across a range of parameter regimes. In particular, we obtain no dependence on l in the asymptotic error when ll is sufficiently large. Empirically, on the task of genomic risk prediction with multiple phenotypes we find that even for values of ll far smaller than the theory would predict, our projection-based method improves the accuracy relative to the variant that doesn't use the projection.

View on arXiv
Comments on this paper