Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06282
Cited By
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
10 June 2024
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PowerInfer-2: Fast Large Language Model Inference on a Smartphone"
13 / 13 papers shown
Title
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
35
0
0
09 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Y. Chen
J. Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
32
0
0
05 May 2025
Scaling On-Device GPU Inference for Large Generative Models
Jiuqiang Tang
Raman Sarokin
Ekaterina Ignasheva
Grant Jensen
Lin Chen
Juhyun Lee
Andrei Kulik
Matthias Grundmann
26
0
0
01 May 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
H. Zhang
Jun Wang
61
0
0
15 Mar 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
49
0
0
08 Jan 2025
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
76
0
0
02 Dec 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Y. Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
44
3
0
04 Nov 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
38
13
0
06 Oct 2024
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
43
31
0
24 Sep 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Mengwei Xu
Wangsong Yin
Dongqi Cai
Rongjie Yi
Daliang Xu
...
Shangguang Wang
Yuanchun Li
Yunxin Liu
Xin Jin
Xuanzhe Liu
VLM
64
70
0
16 Jan 2024
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
112
114
0
16 Dec 2023
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1