ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.20314
  4. Cited By
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for
  Low-Memory GPUs

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

30 May 2024
Wei Zhong
Manasa Bharadwaj
ArXivPDFHTML

Papers citing "S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs"

7 / 7 papers shown
Title
Accelerating Large Language Model Reasoning via Speculative Search
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
54
0
0
03 May 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
Murali Annavaram
LRM
23
0
0
08 Apr 2025
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested
  Large Language Models
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Parsa Kavehzadeh
Mohammadreza Pourreza
Mojtaba Valipour
Tinashu Zhu
Haoli Bai
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
24
0
0
02 Jul 2024
SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative
  Decoding
SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding
Zhenglin Wang
Jialong Wu
Yilong Lai
Congzhi Zhang
Deyu Zhou
LRM
ReLM
22
1
0
26 Jun 2024
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade
Irina Belousova
Qichen Fu
Henry Mason
Mohammad Rastegari
Mahyar Najibi
LRM
24
27
0
16 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
167
3,504
0
10 Jun 2015
1