Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.15758
Cited By
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
24 February 2024
Ziqian Zeng
Jiahong Yu
Qianshi Pang
Zihao W. Wang
Huiping Zhuang
Cen Chen
Xiaofeng Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens"
7 / 7 papers shown
Title
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu
Jingyang Li
Xingyu Xie
Zhihui Lu
Kim-Chuan Toh
Pan Zhou
38
0
0
16 Feb 2025
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
Hyun Ryu
Eric Kim
72
3
0
20 Nov 2024
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
Bin Xiao
Lujun Gui
Lei Su
Weipeng Chen
18
2
0
01 Aug 2024
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Bin Xiao
Chunan Shi
Xiaonan Nie
Fan Yang
Xiangwei Deng
Lei Su
Weipeng Chen
Bin Cui
21
6
0
01 May 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
123
134
0
03 Feb 2024
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
113
389
0
28 Nov 2023
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
C. Hegde
55
107
0
11 Sep 2022
1