Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

24 May 2024

Fuzheng Zhang

Papers citing "Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs"

3 / 3 papers shown

Title
Primer: Searching for Efficient Transformers for Language Modeling David R. So Wojciech Mañke Hanxiao Liu Zihang Dai Noam M. Shazeer Quoc V. Le VLM 83 149 0 17 Sep 2021
Consistent Accelerated Inference via Confident Adaptive Transformers Tal Schuster Adam Fisch Tommi Jaakkola Regina Barzilay AI4TS 179 69 0 18 Apr 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020