Data fission: splitting a single data point

Suppose we observe a random vector from some distribution in a known family with unknown parameters. We ask the following question: when is it possible to split into two parts and such that neither part is sufficient to reconstruct by itself, but both together can recover fully, and the joint distribution of is tractable? As one example, if and is a product distribution, then for any , we can split the sample to define and . Rasines and Young (2022) offers an alternative approach that uses additive Gaussian noise -- this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
View on arXiv