Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.14837
Cited By
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
21 February 2025
Tao Ji
B. Guo
Y. Wu
Qipeng Guo
Lixing Shen
Zhan Chen
Xipeng Qiu
Qi Zhang
Tao Gui
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs"
5 / 5 papers shown
Title
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri
Hubert Strauss
Tri Dao
66
2
0
27 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
88
0
0
22 May 2025
A3 : an Analytical Low-Rank Approximation Framework for Attention
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRL
MQ
133
1
0
19 May 2025
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs
Pencuo Zeren
Qiuming Luo
Rui Mao
Chang Kong
24
0
0
13 May 2025
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Yi Lu
Wanxu Zhao
Xin Zhou
Chenxin An
Cong Wang
...
Jun Zhao
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
102
0
0
26 Apr 2025
1