LLMCad: Fast and Scalable On-device Large Language Model Inference

8 September 2023

Daliang Xu

Wangsong Yin

Xin Jin

Yanzhe Zhang

Shiyun Wei

Mengwei Xu

Xuanzhe Liu

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "LLMCad: Fast and Scalable On-device Large Language Model Inference"

34 / 34 papers shown

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

181

14 Oct 2025

P/D-Device: Disaggregated Large Language Model between Cloud and Devices

...

217

12 Aug 2025

SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and MitigationACM Asia Conference on Computer and Communications Security (AsiaCCS), 2025

Yashothara Shanmugarasa

Ming Ding

M. Chamikara

Thierry Rakotoarivelo

PILM AILaw

445

15 Jun 2025

Edge-First Language Model Inference: Models, Metrics, and Tradeoffs

SiYoung Jang

Roberto Morabito

288

22 May 2025

Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

272

17 May 2025

Token Level Routing Inference System for Edge Devices

239

10 Apr 2025

FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference

235

04 Mar 2025

Tutorial Proposal: Speculative Decoding for Efficient LLM Inference

303

01 Mar 2025

DiSCo: Device-Server Collaborative LLM-Based Text Streaming ServicesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Ting Sun

Penghan Wang

Fan Lai

324

17 Feb 2025

Edge Graph Intelligence: Reciprocally Empowering Edge Networks with Graph Intelligence

500

08 Jan 2025

Deploying Foundation Model Powered Agent Services: A Survey

...

478

18 Dec 2024

A Theoretical Perspective for Speculative Decoding AlgorithmNeural Information Processing Systems (NeurIPS), 2024

211

30 Oct 2024

A Survey: Collaborative Hardware and Software Design in the Era of Large Language ModelsIEEE Circuits and Systems Magazine (IEEE CSM), 2024

...

Yiran Chen

226

08 Oct 2024

Resource Allocation for Stable LLM Training in Mobile Edge ComputingACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2024

Chang Liu

Jun Zhao

180

30 Sep 2024

Small Language Models: Survey, Measurements, and Insights

Zhenyan Lu

Xiang Li

Dongqi Cai

Rongjie Yi

Fangming Liu

Xiwen Zhang

Nicholas D. Lane

Mengwei Xu

ObjD LRM

495

111

24 Sep 2024

Elastic On-Device LLM Service

252

08 Sep 2024

On-Device Language Models: A Comprehensive Review

Jiajun Xu

Zhiyuan Li

Wei Chen

Qun Wang

Xin Gao

Qi Cai

Ziyuan Ling

513

101

26 Aug 2024

Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-TuningInternational Conference on Parallel Processing (ICPP), 2024

Xu Chen

290

20 Aug 2024

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

Guanqiao Qu

Qiyuan Chen

Wei Wei

Zheng Lin

Xianhao Chen

Kaibin Huang

541

155

09 Jul 2024

HYDRA: Model Factorization Framework for Black-Box LLM Personalization

Chao Zhang

Bo Dai

AAML

313

05 Jun 2024

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

335

04 Jun 2024

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang

Xudong Guo

M. Y. Wang

523

30 May 2024

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and OpportunitiesIEEE Communications Surveys and Tutorials (COMST), 2024

Yufei Cui

...

Xue Liu

319

183

17 May 2024

Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models

Chen Zhang

Zhuorui Liu

Dawei Song

LRM

265

23 Apr 2024

Octopus v2: On-device language model for super agent

Wei Chen

Zhiyuan Li

RALM

295

02 Apr 2024

MELTing point: Mobile Evaluation of Language Transformers

Stefanos Laskaridis

Kleomenis Katevas

Lorenzo Minto

Hamed Haddadi

301

19 Mar 2024

SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized GenerationPacific Rim International Conference on Artificial Intelligence (PRICAI), 2024

387

11 Mar 2024

LLM Inference Unveiled: Survey and Roofline Model Insights

Zhihang Yuan

Yuzhang Shang

Yang Zhou

Zhen Dong

Zhe Zhou

...

Yong Jae Lee

Yan Yan

Beidi Chen

Guangyu Sun

Kurt Keutzer

623

149

26 Feb 2024

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

217

21 Feb 2024

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

389

19 Feb 2024

A Survey on Transformer Compression

477

05 Feb 2024

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Heming Xia

Zhe Yang

Qingxiu Dong

Peiyi Wang

Zhifang Sui

462

204

15 Jan 2024

Training and Serving System of Foundation Models: A Comprehensive Survey

223

05 Jan 2024

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

419

121

23 Dec 2023