Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03404
Cited By
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
5 March 2021
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth"
50 / 238 papers shown
Title
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
71
7
0
04 Feb 2023
When Layers Play the Lottery, all Tickets Win at Initialization
Artur Jordão
George Correa de Araujo
H. Maia
Hélio Pedrini
11
3
0
25 Jan 2023
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
17
11
0
23 Dec 2022
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
24
2
0
20 Dec 2022
Non-equispaced Fourier Neural Solvers for PDEs
Haitao Lin
Lirong Wu
Yongjie Xu
Yufei Huang
Siyuan Li
Guojiang Zhao
Z. Stan
14
7
0
09 Dec 2022
A K-variate Time Series Is Worth K Words: Evolution of the Vanilla Transformer Architecture for Long-term Multivariate Time Series Forecasting
Zanwei Zhou
Rui-Ming Zhong
Chen Yang
Yan Wang
Xiaokang Yang
Wei Shen
AI4TS
31
9
0
06 Dec 2022
Spatial-Spectral Transformer for Hyperspectral Image Denoising
Miaoyu Li
Ying Fu
Yulun Zhang
14
67
0
25 Nov 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
14
29
0
21 Nov 2022
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
31
15
0
20 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILM
MoE
19
50
0
14 Nov 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
25
3
0
12 Oct 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Z. Liu
B. Liu
16
0
0
09 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
242
458
0
24 Sep 2022
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
C. Hegde
63
108
0
11 Sep 2022
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
25
1
0
08 Sep 2022
Addressing Token Uniformity in Transformers via Singular Value Transformation
Hanqi Yan
Lin Gui
Wenjie Li
Yulan He
19
14
0
24 Aug 2022
Exploring Generative Neural Temporal Point Process
Haitao Lin
Lirong Wu
Guojiang Zhao
Pai Liu
Stan Z. Li
DiffM
10
25
0
03 Aug 2022
Neural Knowledge Bank for Pretrained Transformers
Damai Dai
Wen-Jie Jiang
Qingxiu Dong
Yajuan Lyu
Qiaoqiao She
Zhifang Sui
KELM
19
21
0
31 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
30
32
0
19 Jun 2022
Rank Diminishing in Deep Neural Networks
Ruili Feng
Kecheng Zheng
Yukun Huang
Deli Zhao
Michael I. Jordan
Zhengjun Zha
18
28
0
13 Jun 2022
Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse
Lorenzo Noci
Sotiris Anagnostidis
Luca Biggio
Antonio Orvieto
Sidak Pal Singh
Aurélien Lucchi
41
65
0
07 Jun 2022
Vision GNN: An Image is Worth Graph of Nodes
Kai Han
Yunhe Wang
Jianyuan Guo
Yehui Tang
Enhua Wu
GNN
3DH
10
351
0
01 Jun 2022
Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing
Jie Chen
Weiqi Liu
Zhizhong Huang
Junbin Gao
Junping Zhang
Jian Pu
21
3
0
30 May 2022
Learning Locality and Isotropy in Dialogue Modeling
Han Wu
Hao Hao Tan
Mingjie Zhan
Gangming Zhao
Shaoqing Lu
Ding Liang
Linqi Song
19
2
0
29 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
141
637
0
26 May 2022
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
52
50
0
26 May 2022
On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization
Amir Joudaki
Hadi Daneshmand
Francis R. Bach
AI4CE
11
2
0
25 May 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
71
42
0
23 May 2022
A Study on Transformer Configuration and Training Objective
Fuzhao Xue
Jianghai Chen
Aixin Sun
Xiaozhe Ren
Zangwei Zheng
Xiaoxin He
Yongming Chen
Xin Jiang
Yang You
30
7
0
21 May 2022
Exploring Extreme Parameter Compression for Pre-trained Language Models
Yuxin Ren
Benyou Wang
Lifeng Shang
Xin Jiang
Qun Liu
28
18
0
20 May 2022
Causal Transformer for Estimating Counterfactual Outcomes
Valentyn Melnychuk
Dennis Frauen
Stefan Feuerriegel
CML
30
91
0
14 Apr 2022
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
Peizhao Li
Puzuo Wang
K. Berntorp
Hongfu Liu
14
43
0
03 Apr 2022
Training-free Transformer Architecture Search
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
32
46
0
23 Mar 2022
Unified Visual Transformer Compression
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zhangyang Wang
ViT
14
91
0
15 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Xiaohan Ding
X. Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian-jun Sun
VLM
47
525
0
13 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Tianlong Chen
Zhenyu (Allen) Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zhangyang Wang
ViT
27
37
0
12 Mar 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
16
94
0
11 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zhangyang Wang
ViT
14
127
0
09 Mar 2022
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning
Jie Zhu
Shenggui Li
Yang You
FedML
8
5
0
24 Feb 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
22
71
0
17 Feb 2022
The Quarks of Attention
Pierre Baldi
Roman Vershynin
GNN
11
9
0
15 Feb 2022
On the Origins of the Block Structure Phenomenon in Neural Network Representations
Thao Nguyen
M. Raghu
Simon Kornblith
11
14
0
15 Feb 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
102
0
16 Jan 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
38
213
0
14 Jan 2022
Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence
Wenchi Ma
Tianxiao Zhang
Guanghui Wang
ViT
28
14
0
26 Dec 2021
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences
Yifan Chen
Qi Zeng
Dilek Z. Hakkani-Tür
Di Jin
Heng Ji
Yun Yang
25
4
0
10 Dec 2021
Dynamic Graph Learning-Neural Network for Multivariate Time Series Modeling
Zhuoling Li
Gaowei Zhang
Lingyu Xu
Jie Yu
AI4TS
11
2
0
06 Dec 2021
Graph Conditioned Sparse-Attention for Improved Source Code Understanding
Junyan Cheng
Iordanis Fostiropoulos
Barry W. Boehm
19
1
0
01 Dec 2021
Pruning Self-attentions into Convolutional Layers in Single Path
Haoyu He
Jianfei Cai
Jing Liu
Zizheng Pan
Jing Zhang
Dacheng Tao
Bohan Zhuang
ViT
29
40
0
23 Nov 2021
MetaFormer Is Actually What You Need for Vision
Weihao Yu
Mi Luo
Pan Zhou
Chenyang Si
Yichen Zhou
Xinchao Wang
Jiashi Feng
Shuicheng Yan
26
868
0
22 Nov 2021
Previous
1
2
3
4
5
Next