141

Can we spot a fake?

Appendix:16 Pages
Abstract

The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point XX from the standard normal distribution in Rn\mathbb{R}^n. An adversary observes XX and corrupts it by adding a vector rtrt, where they can choose any vector tt from a fixed set TT of the adversary's "tricks", and where r>0r>0 is a fixed radius. The adversary's choice of t=t(X)t=t(X) may depend on the true data XX. The adversary wants to hide the corruption by making the fake data X+rtX+rt statistically indistinguishable from the real data XX. What is the largest radius r=r(T)r=r(T) for which the adversary can create an undetectable fake? We show that for highly symmetric sets TT, the detectability radius r(T)r(T) is approximately twice the scaled Gaussian width of TT. The upper bound actually holds for arbitrary sets TT and generalizes to arbitrary, non-Gaussian distributions of real data XX. The lower bound may fail for not highly symmetric TT, but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of TT, which focuses on the most important directions of TT.

View on arXiv
Comments on this paper