ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.19399
  4. Cited By
Scalable-Softmax Is Superior for Attention

Scalable-Softmax Is Superior for Attention

31 January 2025
Ken M. Nakanishi
ArXiv (abs)PDFHTMLHuggingFace (22 upvotes)

Papers citing "Scalable-Softmax Is Superior for Attention"

10 / 10 papers shown
Machine-Learning Accelerated Calculations of Reduced Density Matrices
Machine-Learning Accelerated Calculations of Reduced Density Matrices
Awwab A. Azam
Lexu Zhao
Jiabin Yu
AI4CE
186
0
0
10 Nov 2025
Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
Tomohiro Hayase
B. Collins
Ryo Karakida
140
0
0
08 Oct 2025
Critical attention scaling in long-context transformers
Critical attention scaling in long-context transformers
Shi Chen
Zhengjiang Lin
Yury Polyanskiy
Philippe Rigollet
LRM
130
2
0
07 Oct 2025
Allocation of Parameters in Transformers
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
160
0
0
04 Oct 2025
A multiscale analysis of mean-field transformers in the moderate interaction regime
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno
Federico Pasqualotto
Andrea Agazzi
157
6
0
29 Sep 2025
AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs
AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs
S. Shah
Saurav Prakash
Balaraman Ravindran
86
0
0
14 Sep 2025
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiuying Wei
Anunay Yadav
Razvan Pascanu
Çağlar Gülçehre
AI4TS
256
0
0
06 Jul 2025
Scale-invariant Attention
Scale-invariant Attention
Ben Anson
Xi Wang
Laurence Aitchison
LRM
384
2
0
20 May 2025
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Hector Pasten
Felipe Urrutia
Hector Jimenez
Cristian B. Calderon
Cristóbal Rojas
Chris Köcher
339
0
0
15 May 2025
Multi-Token Attention
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
341
3
0
01 Apr 2025
1