Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2309.04255
Cited By
LLMCad: Fast and Scalable On-device Large Language Model Inference
8 September 2023
Daliang Xu
Wangsong Yin
Xin Jin
Yanzhe Zhang
Shiyun Wei
Mengwei Xu
Xuanzhe Liu
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"LLMCad: Fast and Scalable On-device Large Language Model Inference"
34 / 34 papers shown
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
Fali Wang
Jihai Chen
Shuhua Yang
Ali Al-Lawati
Linli Tang
Hui Liu
Suhang Wang
181
2
0
14 Oct 2025
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin
Yixu Xu
Yue-ting Chen
C. Wang
Tao Wang
...
Zhe Wang
Hefei Guo
Hongjie Liu
Wei Lu
Zhengyong Zhang
217
1
0
12 Aug 2025
SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation
ACM Asia Conference on Computer and Communications Security (AsiaCCS), 2025
Yashothara Shanmugarasa
Ming Ding
M. Chamikara
Thierry Rakotoarivelo
PILM
AILaw
445
10
0
15 Jun 2025
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs
SiYoung Jang
Roberto Morabito
288
7
0
22 May 2025
Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission
Seungeun Oh
Jinhyuk Kim
Jihong Park
Seung-Woo Ko
Jinho Choi
Tony Q. S. Quek
Seong-Lyun Kim
272
1
0
17 May 2025
Token Level Routing Inference System for Edge Devices
Jianshu She
Wenhao Zheng
Zhengzhong Liu
Hongyi Wang
Eric P. Xing
Huaxiu Yao
Qirong Ho
239
3
0
10 Apr 2025
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
Hongchao Du
Shangyu Wu
Arina Kharlamova
Nan Guan
Chun Jason Xue
235
5
0
04 Mar 2025
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia
Cunxiao Du
Yongqian Li
Qian Liu
Wenjie Li
303
2
0
01 Mar 2025
DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ting Sun
Penghan Wang
Fan Lai
324
3
0
17 Feb 2025
Edge Graph Intelligence: Reciprocally Empowering Edge Networks with Graph Intelligence
Liekang Zeng
Shengyuan Ye
Xu Chen
Xiaoxi Zhang
Ju Ren
Jian Tang
Yang Yang
Xuemin
Shen
500
12
0
08 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
478
5
0
18 Dec 2024
A Theoretical Perspective for Speculative Decoding Algorithm
Neural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
211
20
0
30 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
IEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
226
19
0
08 Oct 2024
Resource Allocation for Stable LLM Training in Mobile Edge Computing
ACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2024
Chang Liu
Jun Zhao
180
12
0
30 Sep 2024
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
495
111
0
24 Sep 2024
Elastic On-Device LLM Service
Wangsong Yin
Rongjie Yi
Daliang Xu
Gang Huang
Mengwei Xu
Xuanzhe Liu
252
7
0
08 Sep 2024
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
513
101
0
26 Aug 2024
Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning
International Conference on Parallel Processing (ICPP), 2024
Bei Ouyang
Shengyuan Ye
Liekang Zeng
Tianyi Qian
Jingyi Li
Xu Chen
290
12
0
20 Aug 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
541
155
0
09 Jul 2024
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
Yuchen Zhuang
Haotian Sun
Yue Yu
Rushi Qiang
Qifan Wang
Chao Zhang
Bo Dai
AAML
313
45
0
05 Jun 2024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
Ruslan Svirschevski
Avner May
Zhuoming Chen
Beidi Chen
Zhihao Jia
Max Ryabinin
335
45
0
04 Jun 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
M. Y. Wang
523
42
0
30 May 2024
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
IEEE Communications Surveys and Tutorials (COMST), 2024
Hao Zhou
Chengming Hu
Ye Yuan
Yufei Cui
Yili Jin
...
Di Wu
Xue Liu
Charlie Zhang
Xianbin Wang
Jiangchuan Liu
319
183
0
17 May 2024
Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models
Chen Zhang
Zhuorui Liu
Dawei Song
LRM
265
8
0
23 Apr 2024
Octopus v2: On-device language model for super agent
Wei Chen
Zhiyuan Li
RALM
295
38
0
02 Apr 2024
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
301
34
0
19 Mar 2024
SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized Generation
Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2024
Yanming Liu
Xinyue Peng
Jiannan Cao
Le Dai
Xingzu Liu
Mingbang Wang
Weihao Liu
SyDa
387
2
0
11 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
623
149
0
26 Feb 2024
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Shuzhang Zhong
Zebin Yang
Meng Li
Ruihao Gong
Runsheng Wang
Ru Huang
217
12
0
21 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
389
70
0
19 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
477
66
0
05 Feb 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
462
204
0
15 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
223
14
0
05 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
419
121
0
23 Dec 2023
1