Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. We extend regularized covariance models, which place an additive penalty on the inverse covariance matrix, to this distribution, by placing separate penalties on the covariances of the rows and columns. These so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. Exploiting the structure of our transposable models, we present techniques enabling use of our models with high-dimensional data and give a computationally feasible one-step approximation for imputation. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.
View on arXiv