Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.15470
Cited By
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System
21 February 2025
Yintao He
Haiyu Mao
Christina Giannoula
Mohammad Sadrosadati
Juan Gómez Luna
Huawei Li
Xiaowei Li
Ying Wang
O. Mutlu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System"
2 / 2 papers shown
Title
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Zehao Fan
Garrett Gagnon
Zhenyu Liu
Liu Liu
19
0
0
09 May 2025
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu
Liyan Chen
Yanning Yang
H. Wang
Dong Du
Zhigang Mao
Naifeng Jing
Yubin Xia
Haibo Chen
26
0
0
24 Apr 2025
1