Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08252
Cited By
REST: Retrieval-Based Speculative Decoding
14 November 2023
Zhenyu He
Zexuan Zhong
Tianle Cai
Jason D. Lee
Di He
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"REST: Retrieval-Based Speculative Decoding"
9 / 59 papers shown
Title
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu
Sewon Min
Luke Zettlemoyer
Yejin Choi
Hannaneh Hajishirzi
38
50
0
30 Jan 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
26
116
0
26 Jan 2024
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
25
34
0
24 Jan 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Feng-Huei Lin
Hanling Yi
Hongbin Li
Yifan Yang
Xiaotian Yu
Guangming Lu
Rong Xiao
32
3
0
23 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
29
246
0
19 Jan 2024
Online Speculative Decoding
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
20
50
0
11 Oct 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
366
0
13 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
245
1,071
0
05 Oct 2022
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
44
110
0
16 Feb 2021
Previous
1
2