PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System

21 February 2025

Papers citing "PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System"

2 / 2 papers shown

Title
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Zehao Fan Garrett Gagnon Zhenyu Liu Liu Liu 19 0 0 09 May 2025
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference Qingyuan Liu Liyan Chen Yanning Yang H. Wang Dong Du Zhigang Mao Naifeng Jing Yubin Xia Haibo Chen 26 0 0 24 Apr 2025