ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.14837
  4. Cited By
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

21 February 2025
Tao Ji
B. Guo
Y. Wu
Qipeng Guo
Lixing Shen
Zhan Chen
Xipeng Qiu
Qi Zhang
Tao Gui
ArXiv (abs)PDFHTML

Papers citing "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs"

5 / 5 papers shown
Title
Hardware-Efficient Attention for Fast Decoding
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri
Hubert Strauss
Tri Dao
66
2
0
27 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
88
0
0
22 May 2025
A3 : an Analytical Low-Rank Approximation Framework for Attention
A3 : an Analytical Low-Rank Approximation Framework for Attention
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRLMQ
133
1
0
19 May 2025
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs
Pencuo Zeren
Qiuming Luo
Rui Mao
Chang Kong
24
0
0
13 May 2025
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Yi Lu
Wanxu Zhao
Xin Zhou
Chenxin An
Cong Wang
...
Jun Zhao
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
102
0
0
26 Apr 2025
1