Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.20041
Cited By
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
29 March 2024
Luchang Li
Sheng Qian
Jie Lu
Lunxi Yuan
Rui Wang
Qin Xie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs"
9 / 9 papers shown
Title
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang
Jeff Huang
14
0
0
09 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
Fengwei Zhou
Jiafei Song
Wenjin Jason Li
Gengjian Xue
Zhikang Zhao
Yichao Lu
Bailin Na
12
0
0
23 Apr 2025
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
43
31
0
24 Sep 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
43
1
0
10 Jun 2024
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity
Wentao Guo
Jikai Long
Yimeng Zeng
Zirui Liu
Xinyu Yang
...
Osbert Bastani
Christopher De Sa
Xiaodong Yu
Beidi Chen
Zhaozhuo Xu
21
9
0
05 Jun 2024
WDMoE: Wireless Distributed Large Language Models with Mixture of Experts
Nan Xue
Yaping Sun
Zhiyong Chen
Meixia Tao
Xiaodong Xu
Liang Qian
Shuguang Cui
Ping Zhang
MoE
18
9
0
06 May 2024
QAQ: Quality Adaptive Quantization for LLM KV Cache
Shichen Dong
Wenfang Cheng
Jiayu Qin
Wei Wang
MQ
33
10
0
07 Mar 2024
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,214
0
17 Apr 2017
1