Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.11798
Cited By
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
16 July 2024
Branden Butler
Sixing Yu
Arya Mazaheri
Ali Jannesari
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation"
6 / 6 papers shown
Title
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
19
0
0
05 Apr 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Liusheng Huang
35
0
0
13 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
Tong Wu
Junzhe Shen
Zixia Jia
Y. Wang
Zilong Zheng
72
0
0
26 Feb 2025
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
123
134
0
03 Feb 2024
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
107
389
0
28 Nov 2023
1