Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2508.06447
Cited By
v1
v2 (latest)
SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning
8 August 2025
Junyi Chen
Rubing Yang
Yushi Huang
Desheng Hui
Ao Zhou
Jianlei Yang
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3★)
Papers citing
"SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning"
3 / 3 papers shown
GRATING: Low-Latency and Memory-Efficient Semantic Selection on Device
Jiahao Zhou
Chengliang Lin
Dingji Li
Mingkai Dong
Haibo Chen
139
0
0
17 Oct 2025
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Shaobo Wang
Jiaming Wang
Jiajun Zhang
C. Wang
Yue Min
...
Fei Huang
Huiqiang Jiang
Junyang Lin
Dayiheng Liu
Linfeng Zhang
147
5
0
28 Sep 2025
PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference
Hao Zhang
Mengsi Lyu
Zhuo Chen
Xingrun Xing
Yulong Ao
Yonghua Lin
471
2
0
29 Aug 2025
1