Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2501.19399
Cited By
Scalable-Softmax Is Superior for Attention
31 January 2025
Ken M. Nakanishi
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (22 upvotes)
Papers citing
"Scalable-Softmax Is Superior for Attention"
10 / 10 papers shown
Machine-Learning Accelerated Calculations of Reduced Density Matrices
Awwab A. Azam
Lexu Zhao
Jiabin Yu
AI4CE
186
0
0
10 Nov 2025
Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
Tomohiro Hayase
B. Collins
Ryo Karakida
140
0
0
08 Oct 2025
Critical attention scaling in long-context transformers
Shi Chen
Zhengjiang Lin
Yury Polyanskiy
Philippe Rigollet
LRM
130
2
0
07 Oct 2025
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
160
0
0
04 Oct 2025
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno
Federico Pasqualotto
Andrea Agazzi
157
6
0
29 Sep 2025
AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs
S. Shah
Saurav Prakash
Balaraman Ravindran
86
0
0
14 Sep 2025
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiuying Wei
Anunay Yadav
Razvan Pascanu
Çağlar Gülçehre
AI4TS
256
0
0
06 Jul 2025
Scale-invariant Attention
Ben Anson
Xi Wang
Laurence Aitchison
LRM
384
2
0
20 May 2025
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Hector Pasten
Felipe Urrutia
Hector Jimenez
Cristian B. Calderon
Cristóbal Rojas
Chris Köcher
339
0
0
15 May 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
341
3
0
01 Apr 2025
1