Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.06461
Cited By
A Study on ReLU and Softmax in Transformer
13 February 2023
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Study on ReLU and Softmax in Transformer"
8 / 8 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
39
0
0
26 Apr 2025
On Space Folds of ReLU Neural Networks
Michal Lewandowski
Hamid Eghbalzadeh
Bernhard Heinzl
Raphael Pisoni
Bernhard A.Moser
MLT
73
1
0
17 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang
45
1
0
24 Jan 2025
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
X. Sun
Zhanhui Kang
Di Wang
Rui Yan
30
0
0
11 Nov 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
79
18
0
14 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
51
2
0
02 Oct 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
42
0
0
11 Aug 2024
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
27
42
0
12 Oct 2023
1