Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.11775
Cited By
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
30 August 2019
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel"
40 / 40 papers shown
Title
A Reproduction Study: The Kernel PCA Interpretation of Self-Attention Fails Under Scrutiny
Karahan Sarıtaş
Çağatay Yıldız
31
0
0
12 May 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao-quan Song
Chiwun Yang
VGen
46
2
0
01 Feb 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
77
9
0
11 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
34
7
0
06 Jan 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
61
11
0
03 Jan 2025
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
Tong Chen
Hao Fang
Patrick Xia
Xiaodong Liu
Benjamin Van Durme
Luke Zettlemoyer
Jianfeng Gao
Hao Cheng
KELM
51
2
0
08 Nov 2024
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLM
LRM
40
3
0
16 Oct 2024
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Pavlo Vasylenko
Marcos Vinícius Treviso
André F. T. Martins
Mamba
45
2
0
07 Jul 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
26
5
0
29 Mar 2024
Data-free Weight Compress and Denoise for Large Language Models
Runyu Peng
Yunhua Zhou
Qipeng Guo
Yang Gao
Hang Yan
Xipeng Qiu
Dahua Lin
39
1
0
26 Feb 2024
Breaking Symmetry When Training Transformers
Chunsheng Zuo
Michael Guerzhoy
30
0
0
06 Feb 2024
DF2: Distribution-Free Decision-Focused Learning
Lingkai Kong
Wenhao Mu
Jiaming Cui
Yuchen Zhuang
B. Prakash
Bo Dai
Chao Zhang
OffRL
36
1
0
11 Aug 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
38
5
0
06 Apr 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
31
8
0
03 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
47
11
0
30 Dec 2022
HigeNet: A Highly Efficient Modeling for Long Sequence Time Series Prediction in AIOps
Jiajia Li
Feng Tan
Cheng He
Zikai Wang
Haitao Song
Lingfei Wu
Pengwei Hu
15
0
0
13 Nov 2022
Transformer Meets Boundary Value Inverse Problems
Ruchi Guo
Shuhao Cao
Long Chen
MedIm
33
20
0
29 Sep 2022
Features Fusion Framework for Multimodal Irregular Time-series Events
Peiwang Tang
Xianchao Zhang
AI4TS
26
2
0
05 Sep 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
21
9
0
01 Aug 2022
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
39
65
0
20 May 2022
Approximating Permutations with Neural Network Components for Travelling Photographer Problem
S. Chong
12
0
0
30 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Wasserstein Adversarial Transformer for Cloud Workload Prediction
Shivani Arbat
V. Jayakumar
Jaewoo Lee
Wei Wang
I. Kim
AI4TS
6
22
0
12 Mar 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
24
211
0
17 Feb 2022
Learning Operators with Coupled Attention
Georgios Kissas
Jacob H. Seidman
Leonardo Ferreira Guilhoto
V. Preciado
George J. Pappas
P. Perdikaris
24
109
0
04 Jan 2022
Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture
Kieran Wood
Sven Giegerich
Stephen J. Roberts
S. Zohren
AI4TS
AIFin
13
21
0
16 Dec 2021
Transformers for prompt-level EMA non-response prediction
Supriya Nagesh
Alexander Moreno
Stephanie M Carpenter
Jamie Yap
Soujanya Chatterjee
...
Santosh Kumar
Cho Lam
D. Wetter
Inbal Nahum-Shani
James M. Rehg
14
0
0
01 Nov 2021
Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement
Wenxi Liu
Qi Li
Xin Lin
Weixiang Yang
Shengfeng He
Yuanlong Yu
29
7
0
06 Sep 2021
GraphiT: Encoding Graph Structure in Transformers
Grégoire Mialon
Dexiong Chen
Margot Selosse
Julien Mairal
20
163
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,086
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
39
221
0
31 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
25
44
0
18 May 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
29
224
0
22 Feb 2021
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
8
1,517
0
30 Sep 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
14
134
0
08 Jun 2020
Kernel Self-Attention in Deep Multiple Instance Learning
Dawid Rymarczyk
Adriana Borowa
Jacek Tabor
Bartosz Zieliñski
SSL
14
5
0
25 May 2020
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
50
185
0
14 Nov 2017
1