28
20

Efficient average-case population recovery in the presence of insertions and deletions

Abstract

Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string x{0,1}nx\in\{0,1\}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of xx. The goal is to reconstruct the original string~xx from independent traces of xx. While the best algorithms known for worst-case strings use exp(O(n1/3))\exp(O(n^{1/3})) traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which xx is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution D\cal{D} over ss unknown source strings x1,,xs{0,1}nx^1,\dots,x^s \in \{0,1\}^n, and each sample is independently generated by drawing some xix^i from D\cal{D} and returning an independent trace of xix^i. Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size sexp(Θ(n1/3))s \leq \smash{\exp(\Theta(n^{1/3}))}, for a 1o(1)1-o(1) fraction of all ss-element support sets {x1,,xs}{0,1}n\{x^1,\dots,x^s\} \subset \{0,1\}^n, for every distribution D\cal{D} supported on {x1,,xs}\{x^1,\dots,x^s\}, our algorithm efficiently recovers D{\cal D} up to total variation distance ϵ\epsilon with high probability, given access to independent traces of independent draws from D\cal{D}. The algorithm runs in time poly(n,s,1/ϵ)(n,s,1/\epsilon) and its sample complexity is poly(s,1/ϵ,exp(log1/3n)).(s,1/\epsilon,\exp(\log^{1/3}n)). This polynomial dependence on the support size ss is in sharp contrast with the \emph{worst-case} version (when x1,,xsx^1,\dots,x^s may be any strings in {0,1}n\{0,1\}^n), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in ss.

View on arXiv
Comments on this paper