Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03404
Cited By
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
5 March 2021
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth"
50 / 238 papers shown
Title
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
101
0
0
04 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
69
0
0
01 May 2025
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
94
0
0
26 Apr 2025
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
Fengwei Zhou
Jiafei Song
Wenjin Jason Li
Gengjian Xue
Zhikang Zhao
Yichao Lu
Bailin Na
17
0
0
23 Apr 2025
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
34
0
0
18 Apr 2025
Defending Against Frequency-Based Attacks with Diffusion Models
Fatemeh Amerehi
Patrick Healy
AAML
28
0
0
15 Apr 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Hao Luo
Yibing Song
Gao Huang
Fan Wang
Yang You
66
0
0
09 Apr 2025
Fourier Feature Attribution: A New Efficiency Attribution Method
Zechen Liu
Feiyang Zhang
Wei Song
X. Li
Wei Wei
FAtt
57
0
0
02 Apr 2025
Filtering with Time-frequency Analysis: An Adaptive and Lightweight Model for Sequential Recommender Systems Based on Discrete Wavelet Transform
Sheng Lu
Mingxi Ge
Jiuyi Zhang
Wanli Zhu
Guanjin Li
Fangming Gu
AI4TS
56
0
0
30 Mar 2025
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao
Wei Chen
Qiang Qiu
90
1
0
24 Mar 2025
Temporal Action Detection Model Compression by Progressive Block Drop
Xiaoyong Chen
Yong Guo
Jiaming Liang
Sitong Zhuang
Runhao Zeng
Xiping Hu
43
0
0
21 Mar 2025
Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability
Chenhui Xu
Dancheng Liu
Jiajie Li
Amir Nassereldine
Zhaohui Li
Jinjun Xiong
LRM
59
0
0
05 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
37
2
0
02 Mar 2025
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Maya Bechler-Speicher
Ben Finkelshtein
Fabrizio Frasca
Luis Muller
Jan Tonshoff
...
Michael M. Bronstein
Mathias Niepert
Bryan Perozzi
Mikhail Galkin
Christopher Morris
OOD
97
2
0
21 Feb 2025
Hyperspherical Energy Transformer with Recurrent Depth
Yunzhe Hu
Difan Zou
Dong Xu
39
0
0
17 Feb 2025
Pre-train and Fine-tune: Recommenders as Large Models
Zhenhao Jiang
C. L. P. Chen
Hao Feng
Yu Yang
Jin Liu
Jie Zhang
Jia Jia
Ning Hu
39
0
0
24 Jan 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
46
9
0
03 Jan 2025
Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer
Ziyang Chen
Yongjun Zhang
Wenting Li
Bingshu Wang
Yabo Wu
Yong Zhao
C. L. P. Chen
38
0
0
02 Jan 2025
PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging
Mattias Paul Heinrich
3DPC
37
0
0
23 Dec 2024
Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory
Xingyao Li
Fengzhuo Zhang
Jiachun Pan
Yunlong Hou
Vincent Y. F. Tan
Zhuoran Yang
DiffM
VGen
35
0
0
23 Dec 2024
Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification
Yudong Han
Haocong Wang
Yupeng Hu
Yongshun Gong
Xuemeng Song
Weili Guan
AI4TS
79
0
0
17 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
97
1
0
16 Dec 2024
The Asymptotic Behavior of Attention in Transformers
Álvaro Rodríguez Abella
João Pedro Silvestre
Paulo Tabuada
66
3
0
03 Dec 2024
Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation
S. Ly
Hien Nguyen
72
1
0
28 Nov 2024
Layer Pruning with Consensus: A Triple-Win Solution
Leandro Giusti Mugnaini
Carolina Tavares Duarte
Anna H. Reali Costa
Artur Jordao
66
0
0
21 Nov 2024
A Theory for Compressibility of Graph Transformers for Transductive Learning
Hamed Shirzad
Honghao Lin
A. Velingker
B. Venkatachalam
David P. Woodruff
Danica J. Sutherland
75
1
0
20 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang
Xiangyu Chang
Mingchen Li
A. Roy-Chowdhury
J. Chen
Samet Oymak
73
3
0
19 Nov 2024
Clustering in Causal Attention Masking
Nikita Karagodin
Yury Polyanskiy
Philippe Rigollet
60
5
0
07 Nov 2024
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
Qishuai Wen
Chun-Guang Li
ViT
32
0
0
05 Nov 2024
Activating Self-Attention for Multi-Scene Absolute Pose Regression
Miso Lee
Jihwan Kim
Jae-Pil Heo
ViT
29
0
0
03 Nov 2024
RAM: Replace Attention with MLP for Efficient Multivariate Time Series Forecasting
Suhan Guo
Jiahong Deng
Yi Wei
Hui Dou
F. Shen
Jian Zhao
AI4TS
103
0
0
31 Oct 2024
LSEAttention is All You Need for Time Series Forecasting
Dizhen Liang
AI4TS
32
0
0
31 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
48
1
0
29 Oct 2024
Provable optimal transport with transformers: The essence of depth and prompt engineering
Hadi Daneshmand
OT
29
0
0
25 Oct 2024
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
Haowei Zhu
Dehua Tang
Ji Liu
Mingjie Lu
Jintu Zheng
...
Spandan Tiwari
Ashish Sirasao
Jun-Hai Yong
Bin Wang
E. Barsoum
DiffM
24
5
0
22 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
49
0
0
21 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
17
1
0
14 Oct 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
M. Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving
Pengfei Hu
Yuhang Qian
Tianyue Zheng
Ang Li
Zhe Chen
Yue Gao
Xiuzhen Cheng
Jun-Jie Luo
26
0
0
13 Oct 2024
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling
Alessio Fallani
Ramil I. Nugmanov
Jose A. Arjona-Medina
Jörg Kurt Wegner
Alexandre Tkatchenko
Kostiantyn Chernichenko
MedIm
AI4CE
29
0
0
10 Oct 2024
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li
Weihao Yuan
Yisheng He
Lingteng Qiu
Shenhao Zhu
Xiaodong Gu
Weichao Shen
Yuan Dong
Zilong Dong
Laurence T. Yang
29
8
0
09 Oct 2024
Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
Xueying Bai
Yifan Sun
Niranjan Balasubramanian
CLL
24
0
0
08 Oct 2024
Dynamic Diffusion Transformer
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Yibing Song
Gao Huang
Fan Wang
Yang You
77
11
0
04 Oct 2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu
Lilang Lin
Jiahang Zhang
Y. Ma
Jiaying Liu
DiffM
46
0
0
16 Sep 2024
Increasing transformer token length with a Maximum Entropy Principle Method
R. I. Cukier
18
1
0
17 Aug 2024
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
Georgy Tyukin
G. Dovonon
Jean Kaddour
Pasquale Minervini
LRM
31
0
0
22 Jul 2024
Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning
Xinyuan Gao
Songlin Dong
Yuhang He
Qiang Wang
Yihong Gong
CLL
24
13
0
14 Jul 2024
Adaptive Parametric Activation
Konstantinos Panagiotis Alexandridis
Jiankang Deng
Anh Nguyen
Shan Luo
36
2
0
11 Jul 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
44
2
0
02 Jul 2024
1
2
3
4
5
Next