Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.08803
Cited By
v1
v2 (latest)
Consistent Accelerated Inference via Confident Adaptive Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Consistent Accelerated Inference via Confident Adaptive Transformers"
50 / 64 papers shown
Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
192
0
0
06 Nov 2025
Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding
Mingxiao Huo
Jiayi Zhang
Hewei Wang
Jinfeng Xu
Zheyu Chen
Huilin Tai
Yijun Chen
MLLM
VLM
142
1
0
15 Sep 2025
Statistical Methods in Generative AI
Edgar Dobriban
289
3
0
08 Sep 2025
ODE
t
_t
t
(ODE
l
_l
l
): Shortcutting the Time and the Length in Diffusion and Flow Models for Faster Sampling
Denis A. Gudovskiy
Wenzhao Zheng
Tomoyuki Okuno
Yohei Nakata
Kurt Keutzer
215
0
0
26 Jun 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
460
1
0
14 Apr 2025
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
AI4TS
ReLM
LRM
426
5
0
18 Feb 2025
Uncertainty Guarantees on Automated Precision Weeding using Conformal Prediction
P. Melki
Lionel Bombrun
Boubacar Diallo
Jérôme Dias
Jean-Pierre da Costa
203
1
0
13 Jan 2025
A novel framework for MCDM based on Z numbers and soft likelihood function
Yuanpeng He
220
2
0
26 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
475
5
0
18 Dec 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
389
0
0
25 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
531
19
0
05 Nov 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
International Conference on Learning Representations (ICLR), 2024
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
389
20
0
28 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
International Conference on Learning Representations (ICLR), 2024
Cheng-i Wang
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
430
14
0
07 Oct 2024
A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models
International Conference on Machine Learning (ICML), 2024
Taehong Moon
Moonseok Choi
Eunggu Yun
Jongmin Yoon
Gayoung Lee
Jaewoong Cho
Juho Lee
235
4
0
12 Aug 2024
An Efficient Inference Framework for Early-exit Large Language Models
Ruijie Miao
Yihan Yan
Xinshuo Yao
Tong Yang
185
2
0
25 Jul 2024
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe
Oyvind Tafjord
Yonatan Belinkov
Hanna Hajishirzi
Ashish Sabharwal
218
3
0
21 Jul 2024
Fast yet Safe: Early-Exiting with Risk Control
Metod Jazbec
Alexander Timans
Tin Hadvzi Veljković
K. Sakmann
Dan Zhang
C. A. Naesseth
Eric T. Nalisnick
271
13
0
31 May 2024
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
Chenxi Sun
Hongzhi Zhang
Zijia Lin
Jingyuan Zhang
Fuzheng Zhang
...
Bin Chen
Chengru Song
Chen Zhang
Kun Gai
Deyi Xiong
167
2
0
24 May 2024
CEEBERT: Cross-Domain Inference in Early Exit BERT
Divya J. Bajpai
M. Hanawal
LRM
189
12
0
23 May 2024
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
International Conference on Learning Representations (ICLR), 2024
Nadav Timor
Jonathan Mamou
Daniel Korat
Moshe Berchansky
Oren Pereg
Moshe Wasserblat
Tomer Galanti
Michal Gordon
David Harel
LRM
245
1
0
23 May 2024
Conformal Prediction for Natural Language Processing: A Survey
Transactions of the Association for Computational Linguistics (TACL), 2024
Margarida M. Campos
António Farinhas
Chrysoula Zerva
Mário A. T. Figueiredo
André F. T. Martins
AI4CE
425
36
0
03 May 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
215
14
0
22 Apr 2024
Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models
Zhuoran Jin
Pengfei Cao
Hongbang Yuan
Yubo Chen
Jiexin Xu
Huaijun Li
Xiaojian Jiang
Kang Liu
Jun Zhao
677
73
0
28 Feb 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
619
148
0
26 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
460
66
0
05 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
535
3
0
01 Feb 2024
Early Time Classification with Accumulated Accuracy Gap Control
Liran Ringel
Regev Cohen
Daniel Freedman
Michael Elad
Yaniv Romano
189
8
0
01 Feb 2024
Non-Exchangeable Conformal Language Generation with Nearest Neighbors
Dennis Ulmer
Chrysoula Zerva
André F. T. Martins
374
16
0
01 Feb 2024
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Xuchen Pan
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
245
10
0
01 Feb 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
394
119
0
23 Dec 2023
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Huiping Zhuang
Yihuai Hong
Hongliang Dai
Huiping Zhuang
Cen Chen
277
17
0
19 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
International Conference on Machine Learning (ICML), 2023
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
482
57
0
08 Dec 2023
A Study on the Calibration of In-context Learning
Hanlin Zhang
Yi-Fan Zhang
Yaodong Yu
Dhruv Madeka
Dean Phillips Foster
Eric Xing
Hima Lakkaraju
Sham Kakade
558
22
0
07 Dec 2023
Early-Exit Neural Networks with Nested Prediction Sets
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Metod Jazbec
Patrick Forré
Stephan Mandt
Dan Zhang
Eric T. Nalisnick
UQCV
192
2
0
10 Nov 2023
AutoMix: Automatically Mixing Language Models
Pranjal Aggarwal
Aman Madaan
Ankit Anand
Srividya Pranavi Potharaju
Swaroop Mishra
...
Karthik Kappaganthu
Yiming Yang
Shyam Upadhyay
Manaal Faruqui
Mausam
621
43
0
19 Oct 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
224
46
0
12 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Conference on Robot Learning (CoRL), 2023
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
480
308
0
04 Jul 2023
Conformal Language Modeling
International Conference on Learning Representations (ICLR), 2023
Victor Quach
Adam Fisch
Tal Schuster
Adam Yala
J. Sohn
Tommi Jaakkola
Regina Barzilay
554
96
0
16 Jun 2023
On the Expected Size of Conformal Prediction Sets
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Guneet Singh Dhillon
George Deligiannidis
Tom Rainforth
198
17
0
12 Jun 2023
Towards Anytime Classification in Early-Exit Architectures by Enforcing Conditional Monotonicity
Neural Information Processing Systems (NeurIPS), 2023
Metod Jazbec
J. Allingham
Dan Zhang
Eric T. Nalisnick
206
15
0
05 Jun 2023
Uncertainty in Natural Language Processing: Sources, Quantification, and Applications
Mengting Hu
Zhen Zhang
Shiwan Zhao
Shiyu Huang
Bingzhe Wu
BDL
262
52
0
05 Jun 2023
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Daniel Rotem
Michael Hassid
Jonathan Mamou
Roy Schwartz
196
6
0
04 Jun 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
371
17
0
21 May 2023
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Pranjal Aggarwal
Aman Madaan
Yiming Yang
Mausam
LRM
344
81
0
19 May 2023
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
International Conference on Language Resources and Evaluation (LREC), 2023
Alexander Yom Din
Taelin Karidi
Leshem Choshen
Mor Geva
200
80
0
16 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
288
150
0
27 Feb 2023
Adaptive Computation with Elastic Input Sequence
International Conference on Machine Learning (ICML), 2023
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
241
27
0
30 Jan 2023
FlexiViT: One Model for All Patch Sizes
Computer Vision and Pattern Recognition (CVPR), 2022
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim Alabdulmohsin
Filip Pavetić
VLM
422
136
0
15 Dec 2022
Fast Inference from Transformers via Speculative Decoding
International Conference on Machine Learning (ICML), 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
597
1,133
0
30 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Computer Vision and Pattern Recognition (CVPR), 2022
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
212
49
0
21 Nov 2022
1
2
Next