ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.07863
  4. Cited By
Speculative Decoding with Big Little Decoder
v1v2v3v4 (latest)

Speculative Decoding with Big Little Decoder

Neural Information Processing Systems (NeurIPS), 2023
15 February 2023
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
    MoE
ArXiv (abs)PDFHTML

Papers citing "Speculative Decoding with Big Little Decoder"

50 / 103 papers shown
Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios
Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios
Luohe Shi
Zuchao Li
Lefei Zhang
Baoyuan Qi
Guoming Liu
Hai Zhao
AI4TS
189
0
0
25 Nov 2025
When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding
When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding
Min Fang
Zhihui Fu
Qibin Zhao
Jun Wang
109
0
0
03 Nov 2025
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
Amir Ziashahabi
Yavuz Faruk Bakman
D. Yaldiz
Mostafa El-Khamy
Sai Praneeth Karimireddy
Salman Avestimehr
113
1
0
01 Nov 2025
Polybasic Speculative Decoding Through a Theoretical Perspective
Polybasic Speculative Decoding Through a Theoretical Perspective
Ruilin Wang
Huixia Li
Yuexiao Ma
Xiawu Zheng
Fei Chao
Xuefeng Xiao
Rongrong Ji
236
0
0
30 Oct 2025
Batch Speculative Decoding Done Right
Batch Speculative Decoding Done Right
Ranran Haoran Zhang
Soumik Dey
Ashirbad Mishra
Hansi Wu
Binbin Li
Rui Zhang
103
0
0
26 Oct 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLMVLM
211
0
0
26 Oct 2025
Fast Inference via Hierarchical Speculative Decoding
Fast Inference via Hierarchical Speculative Decoding
Clara Mohri
Haim Kaplan
Tal Schuster
Yishay Mansour
Amir Globerson
194
0
0
22 Oct 2025
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Nikhil Bhendawade
K. Nishu
Arnav Kundu
Chris Bartels
Minsik Cho
Irina Belousova
LRM
332
0
0
15 Oct 2025
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
Yushu Zhao
Yubin Qin
Yang Wang
Xiaolong Yang
Huiming Han
Shaojun Wei
Yang Hu
Shouyi Yin
MoE
169
0
0
14 Oct 2025
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
Fali Wang
Jihai Chen
Shuhua Yang
Ali Al-Lawati
Linli Tang
Hui Liu
Suhang Wang
186
2
0
14 Oct 2025
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
Dachuan Shi
Abedelkadir Asi
Keying Li
Xiangchi Yuan
Leyan Pan
Wenke Lee
Wen Xiao
LRM
154
0
0
06 Oct 2025
Staircase Streaming for Low-Latency Multi-Agent Inference
Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang
Jue Wang
Zhen
Ben Athiwaratkun
Bhuwan Dhingra
Ce Zhang
James Y. Zou
182
0
0
06 Oct 2025
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Minseo Kim
Coleman Hooper
Aditya Tomar
Chenfeng Xu
Mehrdad Farajtabar
Michael W. Mahoney
Kurt Keutzer
Amir Gholami
174
2
0
05 Oct 2025
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Shijing Hu
Jingyang Li
Zhihui Lu
Pan Zhou
142
0
0
26 Sep 2025
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
Kanghoon Yoon
Minsub Kim
Sungjae Lee
Joonhyung Lee
Sunghyeon Woo
Yeonjun In
S. Kwon
Chanyoung Park
Dongsoo Lee
122
1
0
26 Sep 2025
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Jing Xiong
Qiujiang Chen
Fanghua Ye
Zhongwei Wan
Chuanyang Zheng
...
Haochen Tan
Haoli Bai
Lifeng Shang
Lingpeng Kong
Ngai Wong
LRM
207
0
0
18 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
170
1
0
16 Sep 2025
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
Yicheng Ji
Jun Zhang
Heming Xia
Jinpeng Chen
Lidan Shou
Gang Chen
Huan Li
VLM
243
3
0
22 Aug 2025
Confidence-Modulated Speculative Decoding for Large Language Models
Confidence-Modulated Speculative Decoding for Large Language Models
Jaydip Sen
Subhasis Dasgupta
Hetvi Waghela
UQLM
297
1
0
21 Aug 2025
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Chenyang Le
Yinfeng Xia
Huiyan Li
Manhong Wang
Yutao Sun
Xingyang Ma
Yanmin Qian
88
0
0
15 Aug 2025
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
Yi Zhao
Yajuan Peng
Cam-Tu Nguyen
Zuchao Li
Xiaoliang Wang
Hai Zhao
Xiaoming Fu
216
2
0
03 Aug 2025
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Sangmin Bae
Yujin Kim
Reza Bayat
S. Kim
Jiyoun Ha
...
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Aaron Courville
Se-Young Yun
MoE
297
25
0
14 Jul 2025
OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding
OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding
R. Ramakrishnan
Zhaocong Yuan
Shaojie Zhuo
Chen Feng
Yicheng Lin
Chenzheng Su
Xiaopeng Zhang
SyDa
347
1
0
03 Jul 2025
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhou Chen
Zhiqiang Wei
Yuqi Bai
Xue Xiong
Jianmin Wu
3DV
178
6
0
14 Jun 2025
Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse
Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse
Zhekai Duan
Yuan Zhang
Shikai Geng
Gaowen Liu
Joschka Boedecker
Chris Xiaoxuan Lu
LRM
277
11
0
09 Jun 2025
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
Zhepei Wei
Wei-Lin Chen
Xinyu Zhu
Yu Meng
OffRL
309
3
0
04 Jun 2025
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
Yixuan Wang
Yijun Liu
Shiyu Ji
Yuzhuang Xu
Yang Xu
Qingfu Zhu
Wanxiang Che
OffRLLRM
300
1
0
24 May 2025
KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization
KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization
Mingbo Song
Heming Xia
Jun Zhang
Chak Tou Leong
Qiancheng Xu
Wenjie Li
Sujian Li
191
1
0
22 May 2025
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
Yunho Jin
Gu-Yeon Wei
David Brooks
LRM
426
7
0
20 May 2025
Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Jikai Wang
Zhenxu Tian
Jilong Li
Qingrong Xia
Xinyu Duan
Zhefeng Wang
Baoxing Huai
Min Zhang
296
3
0
19 May 2025
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval OverlapsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jie Ou
Jinyu Guo
Shuaihong Jiang
Zhaokun Wang
Libo Qin
Shunyu Yao
Wenhong Tian
3DV
519
4
0
19 May 2025
Automatic Task Detection and Heterogeneous LLM Speculative Decoding
Automatic Task Detection and Heterogeneous LLM Speculative Decoding
Danying Ge
Jianhua Gao
Qizhi Jiang
Yifei Feng
Weixing Ji
231
0
0
13 May 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
Junlin Li
Jianye Hou
Hao Fei
Lijun Wu
Min Zhang
LLMAGLRM
355
13
0
27 Apr 2025
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Yang Liu
Bingjie Yan
Tianyuan Zou
Jianqing Zhang
Zixuan Gu
...
Jiajian Li
Xiaozhou Ye
Ye Ouyang
Qiang Yang
Yanzhe Zhang
ALM
1.0K
4
0
24 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
467
1
0
14 Apr 2025
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Abhimanyu Bambhaniya
Hanjiang Wu
Suvinay Subramanian
Sudarshan Srinivasan
Souvik Kundu
Amir Yazdanbakhsh
Suvinay Subramanian
Madhu Kumar
Tushar Krishna
1.0K
0
0
14 Apr 2025
The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation
The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation
Zhenru Zhang
Ning Li
Qi Liu
Rui Li
W. Gao
Qingyang Mao
Zhenya Huang
Baosheng Yu
Dacheng Tao
RALM
302
0
0
11 Apr 2025
SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding
SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
357
1
0
05 Apr 2025
Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding
Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding
Aayush Gautam
Susav Shrestha
Narasimha Annapareddy
491
2
0
28 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLMOffRLLRM
665
104
0
27 Mar 2025
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Junhyuk So
Jiwoong Shin
Chaeyeon Jang
Eunhyeok Park
DiffM
342
0
0
25 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
1.1K
9
0
09 Mar 2025
AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving
AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
295
10
0
07 Mar 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
Y. Guo
Yuchen Yang
Zhe Chen
Pingjie Wang
Yusheng Liao
Yujiao Shi
Yanfeng Wang
Yu Wang
HILM
306
2
0
05 Mar 2025
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv
Honglin Guo
Qipeng Guo
Xipeng Qiu
308
1
0
02 Mar 2025
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia
Cunxiao Du
Yongqian Li
Qian Liu
Wenjie Li
309
2
0
01 Mar 2025
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime TradeoffAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Maximilian Holsman
Yukun Huang
Bhuwan Dhingra
585
4
0
28 Feb 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
754
0
0
27 Feb 2025
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
TETRIS: Optimal Draft Token Selection for Batch Speculative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhaoxuan Wu
Zijian Zhou
Arun Verma
Alok Prakash
Daniela Rus
Bryan Kian Hsiang Low
352
3
0
21 Feb 2025
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang
Bairu Hou
Wei Wei
Yujia Bao
Shiyu Chang
VLM
751
25
0
21 Feb 2025
123
Next
Page 1 of 3