ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10480
  4. Cited By
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

17 May 2024
Rya Sanovar
Srikant Bharadwaj
Renée St. Amant
Victor Rühle
Saravan Rajmohan
ArXivPDFHTML

Papers citing "Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers"

5 / 5 papers shown
Title
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Venmugil Elango
43
0
0
20 Mar 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
51
19
0
02 Jan 2025
POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM
  Inference
POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
Aditya K Kamath
Ramya Prabhu
Jayashree Mohan
Simon Peter
R. Ramjee
Ashish Panwar
36
9
0
23 Oct 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
32
112
0
11 Jul 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
R. Ramjee
Ashish Panwar
VLM
34
24
0
07 May 2024
1