434

Coupling without Communication and Drafter-Invariant Speculative Decoding

International Symposium on Information Theory (ISIT), 2024
Main:15 Pages
3 Figures
Bibliography:3 Pages
1 Tables
Abstract

Suppose Alice has a distribution PP and Bob has a distribution QQ. Alice wants to generate a sample aPa\sim P and Bob a sample bQb \sim Q such that a=ba = b with has as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve Pr[a=b]=1DTV(P,Q)Pr[a = b] = 1 - D_{TV}(P,Q), where DTV(P,Q)D_{TV}(P,Q) is the total variation distance. What if Alice and Bob must solve this same problem without communicating at all? Perhaps surprisingly, with access to public randomness, they can still achieve Pr[a=b]1DTV(P,Q)1+DTV(P,Q)12DTV(P,Q)Pr[a = b] \geq \frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q). In fact, this bound can be obtained using a simple protocol based on the Weighted MinHash algorithm. In this work, we explore the communication-free coupling in greater depth. First, we show that an equally simple protocol based on Gumbel sampling matches the worst-case guarantees of the Weighted MinHash approach, but tends to perform better in practice. Conversely, we prove that both approaches are actually sharp: no communication-free protocol can achieve Pr[a=b]>1DTV(P,Q)1+DTV(P,Q)Pr[a=b]>\frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} in the worst-case. Finally, we prove that, for distributions over nn items, there exists a scheme that uses just O(log(n/ϵ))O(\log(n/\epsilon)) bits of communication to achieve Pr[a=b]=1DTV(P,Q)ϵPr[a = b] = 1 - D_{TV}(P,Q) - \epsilon, i.e. to essentially match optimal coupling. Beyond our theoretical results, we demonstrate an application of communication-free coupling to speculative decoding, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding, which has the desirable property that the output of the method is fixed given a fixed random seed, regardless of what drafter is used for speculation.

View on arXiv
Comments on this paper