357

Generating (Formulaic) Text by Splicing Together Nearest Neighbors

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Abstract

We propose to tackle conditional text generation tasks, especially those which require generating formulaic text, by splicing together segments of text from retrieved "neighbor" source-target pairs. Unlike recent work that conditions on retrieved neighbors in an encoder-decoder setting but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text (i.e., by inserting or replacing them) to form an output. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way allow for interpretable table-to-text and headline generation that is competitive with or better than state-of-the-art autoregressive token-level policies in terms of automatic metrics, and moreover allows for faster decoding.

View on arXiv
Comments on this paper