42
8

A Nonparametric Normality Test for High-dimensional Data

Abstract

Many statistical methodologies for high-dimensional data assume the population normality. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the type I error when the dimension is growing with the number of observations. In this work, we propose a novel nonparametric test that utilizes the nearest neighbor information. The proposed method theoretically guarantees the asymptotic type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension is larger than the sample size and at the same time exhibit the superior power performance of the new test compared with the alternative methods. We also illustrate our approach through a popularly used lung cancer data set in high-dimensional classification literatures where deviation from the normality assumption may lead to completely invalid conclusion.

View on arXiv
Comments on this paper