Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.04466
Cited By
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
6 October 2024
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
Jun Liu
Yaoxiu Lian
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective"
8 / 8 papers shown
Title
Token Level Routing Inference System for Edge Devices
Jianshu She
Wenhao Zheng
Zhengzhong Liu
Hongyi Wang
Eric P. Xing
Huaxiu Yao
Qirong Ho
31
0
0
10 Apr 2025
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Wenqi Jiang
Suvinay Subramanian
Cat Graves
Gustavo Alonso
Amir Yazdanbakhsh
Vidushi Dadu
33
5
0
18 Mar 2025
ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
Wenqiang Wang
Yijia Zhang
Zikai Zhang
Guanting Huo
Hao Liang
Shijie Cao
Ningyi Xu
38
0
0
17 Mar 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Omri Weinstein
47
0
0
15 Mar 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding
Wentao Jiang
Shunyu Liu
Yongcheng Jing
J. Guo
...
Zengmao Wang
Z. Liu
Bo Du
X. Liu
Dacheng Tao
LRM
36
4
0
22 Feb 2025
SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors
M. Rakka
J. Li
Guohao Dai
A. Eltawil
M. Fouda
Fadi J. Kurdahi
60
0
0
26 Nov 2024
MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation
Toby Godfrey
William Hunt
Mohammad D. Soorati
28
1
0
18 Oct 2024
ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents
Jakub Hoscilowicz
Bartosz Maj
Bartosz Kozakiewicz
Oleksii Tymoshchuk
Artur Janicki
LLMAG
36
1
0
09 Oct 2024
1