A Study on ReLU and Softmax in Transformer

A Study on ReLU and Softmax in Transformer

13 February 2023

Junliang Guo

Jiang Bian

Papers citing "A Study on ReLU and Softmax in Transformer"

8 / 8 papers shown

Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity Ruifeng Ren Yong Liu 39 0 0 26 Apr 2025
On Space Folds of ReLU Neural Networks Michal Lewandowski Hamid Eghbalzadeh Bernhard Heinzl Raphael Pisoni Bernhard A.Moser MLT 73 1 0 17 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention Qiuhao Zeng Jerry Huang Peng Lu Gezheng Xu Boxing Chen Charles X. Ling Boyu Wang 45 1 0 24 Jan 2025
More Expressive Attention with Negative Weights Ang Lv Ruobing Xie Shuaipeng Li Jiayi Liao X. Sun Zhanhui Kang Di Wang Rui Yan 30 0 0 11 Nov 2024
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song 79 18 0 14 Oct 2024
Attention layers provably solve single-location regression P. Marion Raphael Berthier Gérard Biau Claire Boyer 51 2 0 02 Oct 2024
Sampling Foundational Transformer: A Theoretical Perspective Viet Anh Nguyen Minh Lenhat Khoa Nguyen Duong Duc Hieu Dao Huu Hung Truong Son-Hy 42 0 0 11 Aug 2024
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining Licong Lin Yu Bai Song Mei OffRL 27 42 0 12 Oct 2023