457
v1v2v3 (latest)

Do Vendi Scores Converge with Finite Samples? Truncated Vendi Score for Finite-Sample Convergence Guarantees

Conference on Uncertainty in Artificial Intelligence (UAI), 2024
Main:8 Pages
17 Figures
Bibliography:4 Pages
3 Tables
Appendix:16 Pages
Abstract

Evaluating the diversity of generative models without reference data poses methodological challenges. The reference-free Vendi and RKE scores address this by quantifying the diversity of generated data using matrix-based entropy measures. Among these two, the Vendi score is typically computed via the eigendecomposition of an n×nn \times n kernel matrix constructed from n generated samples. However, the prohibitive computational cost of eigendecomposition for large nn often limits the number of samples used to fewer than 20,000. In this paper, we investigate the statistical convergence of the Vendi and RKE scores under restricted sample sizes. We numerically demonstrate that, in general, the Vendi score computed with standard sample sizes below 20,000 may not converge to its asymptotic value under infinite sampling. To address this, we introduce the tt-truncated Vendi score by truncating the eigenspectrum of the kernel matrix, which is provably guaranteed to converge to its population limit with n=O(t)n=\mathcal{O}(t) samples. We further show that existing Nyström and FKEA approximation methods converge to the asymptotic limit of the truncated Vendi score. In contrast to the Vendi score, we prove that the RKE score enjoys universal convergence guarantees across all kernel functions. We conduct several numerical experiments to illustrate the concentration of Nyström and FKEA computed Vendi scores around the truncated Vendi score, and we analyze how the truncated Vendi and RKE scores correlate with the diversity of image and text data. The code is available atthis https URL.

View on arXiv
Comments on this paper