In this paper, we establish sample complexity bounds for learning high-dimensional simplices in from noisy data. Specifically, we consider i.i.d. samples uniformly drawn from an unknown simplex in , each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within or total variation (TV) distance at most from the true simplex, provided , where is the signal-to-noise ratio. Extending our prior work~\citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance requires at least samples, where denotes the noise variance. In the noiseless scenario, our lower bound matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when , noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.
View on arXiv@article{saberi2025_2506.10101, title={ Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes }, author={ Seyed Amir Hossein Saberi and Amir Najafi and Abolfazl Motahari and Babak H. khalaj }, journal={arXiv preprint arXiv:2506.10101}, year={ 2025 } }