Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.20314
Cited By
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
30 May 2024
Wei Zhong
Manasa Bharadwaj
Re-assign community
ArXiv
PDF
HTML
Papers citing
"S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs"
7 / 7 papers shown
Title
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
54
0
0
03 May 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
Murali Annavaram
LRM
23
0
0
08 Apr 2025
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Parsa Kavehzadeh
Mohammadreza Pourreza
Mojtaba Valipour
Tinashu Zhu
Haoli Bai
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
24
0
0
02 Jul 2024
SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding
Zhenglin Wang
Jialong Wu
Yilong Lai
Congzhi Zhang
Deyu Zhou
LRM
ReLM
22
1
0
26 Jun 2024
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade
Irina Belousova
Qichen Fu
Henry Mason
Mohammad Rastegari
Mahyar Najibi
LRM
24
27
0
16 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
167
3,504
0
10 Jun 2015
1