Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.18628
Cited By
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
28 May 2024
Hao Chen
Wayne Luk
Ka-Fai Cedric Yiu
Rui Li
Konstantin Mishchenko
Stylianos I. Venieris
Hongxiang Fan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference"
4 / 4 papers shown
Title
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
Xilong Xie
Liang Wang
Limin Xiao
Meng Han
L. Sun
S. Zheng
Xiangrong Xu
MQ
21
0
0
28 Apr 2025
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Mingdao Liu
Aohan Zeng
Bowen Wang
Peng Zhang
Jie Tang
Yuxiao Dong
56
7
0
12 Jan 2024
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
1