Probabilistic language models, e.g. those based on an LSTM, often face the problem of finding a high probability prediction from a sequence of random variables over a set of tokens. This is commonly addressed using a form of greedy decoding such as beam search, where a limited number of highest-likelihood paths (the beam width) of the decoder are kept, and at the end the maximum-likelihood path is chosen. In this work, we construct a quantum algorithm to find the globally optimal parse (i.e. for infinite beam width) with high constant success probability. When the input to the decoder is distributed as a power-law with exponent , our algorithm has runtime , where is the alphabet size, the input length; here , and exponentially fast with increasing , hence making our algorithm always more than quadratically faster than its classical counterpart. We further modify our procedure to recover a finite beam width variant, which enables an even stronger empirical speedup while still retaining higher accuracy than possible classically. Finally, we apply this quantum beam search decoder to Mozilla's implementation of Baidu's DeepSpeech neural net, which we show to exhibit such a power law word rank frequency.
View on arXiv