Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2406.14963
Cited By
Optimised Grouped-Query Attention Mechanism for Transformers
21 June 2024
Yuang Chen
Cheng Zhang
Xitong Gao
Robert D. Mullins
George A. Constantinides
Yiren Zhao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Optimised Grouped-Query Attention Mechanism for Transformers"
5 / 5 papers shown
Title
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Zhaoyuan Su
Tingfeng Lan
Zirui Wang
Juncheng Yang
Yue Cheng
68
0
0
24 May 2025
Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks
Amey Hengle
Prasoon Bajpai
Soham Dan
Tanmoy Chakraborty
LRM
116
1
0
17 Apr 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Omri Weinstein
175
1
0
15 Mar 2025
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
Hong Yankun
Li Xing
Zhen Hui-Ling
Yu Xianzhi
Liu Wulong
Yuan Mingxuan
MQ
168
0
0
24 Feb 2025
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
Zohaib Khan
Muhammad Khaquan
Omer Tafveez
Burhanuddin Samiwala
Agha Ali Raza
103
3
0
15 Aug 2024
1