Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

28 May 2024

Papers citing "Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference"

4 / 4 papers shown

Title
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs Xilong Xie Liang Wang Limin Xiao Meng Han L. Sun S. Zheng Xiangrong Xu MQ 21 0 0 28 Apr 2025
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 120 134 0 03 Feb 2024
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding Mingdao Liu Aohan Zeng Bowen Wang Peng Zhang Jie Tang Yuxiao Dong 56 7 0 12 Jan 2024
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 275 3,784 0 18 Apr 2021