ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.14963
  4. Cited By
Optimised Grouped-Query Attention Mechanism for Transformers

Optimised Grouped-Query Attention Mechanism for Transformers

21 June 2024
Yuang Chen
Cheng Zhang
Xitong Gao
Robert D. Mullins
George A. Constantinides
Yiren Zhao
ArXiv (abs)PDFHTML

Papers citing "Optimised Grouped-Query Attention Mechanism for Transformers"

5 / 5 papers shown
Title
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Zhaoyuan Su
Tingfeng Lan
Zirui Wang
Juncheng Yang
Yue Cheng
68
0
0
24 May 2025
Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks
Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks
Amey Hengle
Prasoon Bajpai
Soham Dan
Tanmoy Chakraborty
LRM
116
1
0
17 Apr 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Omri Weinstein
175
1
0
15 Mar 2025
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
Hong Yankun
Li Xing
Zhen Hui-Ling
Yu Xianzhi
Liu Wulong
Yuan Mingxuan
MQ
168
0
0
24 Feb 2025
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
Zohaib Khan
Muhammad Khaquan
Omer Tafveez
Burhanuddin Samiwala
Agha Ali Raza
103
3
0
15 Aug 2024
1