ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.08586
  4. Cited By
Replacing softmax with ReLU in Vision Transformers

Replacing softmax with ReLU in Vision Transformers

15 September 2023
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
    ViT
ArXivPDFHTML

Papers citing "Replacing softmax with ReLU in Vision Transformers"

22 / 22 papers shown
Title
Self-Adjust Softmax
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Z. Li
Yu-Hu Li
38
0
0
25 Feb 2025
More Expressive Attention with Negative Weights
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
X. Sun
Zhanhui Kang
Di Wang
Rui Yan
30
0
0
11 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With
  SSMs
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
28
1
0
31 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
82
18
0
14 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
57
2
0
02 Oct 2024
Attention is a smoothed cubic spline
Attention is a smoothed cubic spline
Zehua Lai
Lek-Heng Lim
Yucong Liu
23
2
0
19 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
42
0
0
11 Aug 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
K. Riedhammer
Tobias Bocklet
33
3
0
16 Jun 2024
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic
  Architecture
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang
Dan Guo
Kun Li
Zhun Zhong
Mengqing Wang
34
16
0
12 Mar 2024
The Hidden Attention of Mamba Models
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
32
57
0
03 Mar 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
42
24
0
21 Feb 2024
On Provable Length and Compositional Generalization
On Provable Length and Compositional Generalization
Kartik Ahuja
Amin Mansouri
OODD
26
7
0
07 Feb 2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear
  Functions In Context
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
18
35
0
11 Dec 2023
MobileUtr: Revisiting the relationship between light-weight CNN and
  Transformer for efficient medical image segmentation
MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation
Fenghe Tang
Bingkun Nian
Jianrui Ding
Quan Quan
Jie-jin Yang
Wei Liu
S.Kevin Zhou
ViT
MedIm
18
3
0
04 Dec 2023
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao
Yanwu Xu
Zhisheng Xiao
Haolin Jia
Tingbo Hou
VLM
34
11
0
28 Nov 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
  Study on Learning with Representations
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
16
46
0
16 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement
  Learning via Supervised Pretraining
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
30
42
0
12 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
  Models
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
30
80
0
25 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
11
22
0
21 Jul 2023
Transformers learn to implement preconditioned gradient descent for
  in-context learning
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
17
147
0
01 Jun 2023
Transformer Quality in Linear Time
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
220
0
21 Feb 2022
1