28
0

How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning

Main:9 Pages
7 Figures
Bibliography:3 Pages
Appendix:18 Pages
Abstract

We present a generalised Hanson-Wright inequality and use it to establish new statistical insights into the geometry of data point-clouds. In the setting of a general random function model of data, we clarify the roles played by three notions of dimensionality: ambient intrinsic dimension pintp_{\mathrm{int}}, which measures total variability across orthogonal feature directions; correlation rank, which measures functional complexity across samples; and latent intrinsic dimension, which is the dimension of manifold structure hidden in data. Our analysis shows that in order for persistence diagrams to reveal latent homology and for manifold structure to emerge it is sufficient that pintlognp_{\mathrm{int}}\gg \log n, where nn is the sample size. Informed by these theoretical perspectives, we revisit the ground-breaking neuroscience discovery of toroidal structure in grid-cell activity made by Gardner et al. (Nature, 2022): our findings reveal, for the first time, evidence that this structure is in fact isometric to physical space, meaning that grid cell activity conveys a geometrically faithful representation of the real world.

View on arXiv
@article{sansford2025_2505.16879,
  title={ How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning },
  author={ Hannah Sansford and Nick Whiteley and Patrick Rubin-Delanchy },
  journal={arXiv preprint arXiv:2505.16879},
  year={ 2025 }
}
Comments on this paper