ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21487
  4. Cited By
Hardware-Efficient Attention for Fast Decoding

Hardware-Efficient Attention for Fast Decoding

27 May 2025
Ted Zadouri
Hubert Strauss
Tri Dao
ArXiv (abs)PDFHTML

Papers citing "Hardware-Efficient Attention for Fast Decoding"

9 / 9 papers shown
Title
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao
Shuming Guo
Shijie Cao
Yuqing Xia
Yu Cheng
...
Hayden Kwok-Hay So
Yu Hua
Ting Cao
Fan Yang
Mao Yang
VLMLRM
23
0
0
10 Jun 2025
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Pol G. Recasens
Ferran Agullo
Yue Zhu
Chen Wang
Eun Kyung Lee
Olivier Tardieu
Jordi Torres
Josep Ll. Berral
92
1
0
11 Mar 2025
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su
Wei Zhao
Xuelong Li
Muralidhar Andoorveedu
Chenhao Jiang
Zhanda Zhu
Kevin Song
Christina Giannoula
Gennady Pekhimenko
LRM
127
2
0
09 Mar 2025
Slim attention: cut your context memory in half without loss -- K-cache is all you need for MHA
Slim attention: cut your context memory in half without loss -- K-cache is all you need for MHA
Nils Graef
Matthew Clapp
107
2
0
07 Mar 2025
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Tao Ji
B. Guo
Y. Wu
Qipeng Guo
Lixing Shen
Zhan Chen
Xipeng Qiu
Qi Zhang
Tao Gui
109
7
0
21 Feb 2025
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang
Bharat Venkitesh
Dwarak Talupuru
Hangyu Lin
David Cairuz
Phil Blunsom
Acyr Locatelli
202
6
0
30 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
392
2,028
0
22 Jan 2025
Tensor Product Attention Is All You Need
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
224
15
0
11 Jan 2025
Round and Round We Go! What makes Rotary Positional Encodings useful?
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
133
29
0
08 Oct 2024
1