Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2501.19399
Cited By

Scalable-Softmax Is Superior for Attention

Scalable-Softmax Is Superior for Attention

31 January 2025

Ken M. Nakanishi

ArXiv (abs)PDF HTML HuggingFace (22 upvotes)

Papers citing "Scalable-Softmax Is Superior for Attention"

10 / 10 papers shown

Machine-Learning Accelerated Calculations of Reduced Density Matrices

Machine-Learning Accelerated Calculations of Reduced Density Matrices

186

0

0

10 Nov 2025

Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix

Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix

Tomohiro Hayase

140

0

0

08 Oct 2025

Critical attention scaling in long-context transformers

Critical attention scaling in long-context transformers

Yury Polyanskiy

Philippe Rigollet

130

2

0

07 Oct 2025

Allocation of Parameters in Transformers

Allocation of Parameters in Transformers

160

0

0

04 Oct 2025

A multiscale analysis of mean-field transformers in the moderate interaction regime

A multiscale analysis of mean-field transformers in the moderate interaction regime

Federico Pasqualotto

157

6

0

29 Sep 2025

AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs

AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs

Balaraman Ravindran

86

0

0

14 Sep 2025

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

Çağlar Gülçehre

256

0

0

06 Jul 2025

Scale-invariant Attention

Scale-invariant Attention

Laurence Aitchison

384

2

0

20 May 2025

Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models

Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models

Cristian B. Calderon

Cristóbal Rojas

339

0

0

15 May 2025

Multi-Token Attention

Multi-Token Attention

O. Yu. Golovneva

Sainbayar Sukhbaatar

341

3

0

01 Apr 2025