Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.08078
Cited By
v1
v2 (latest)
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
International Conference on Machine Learning (ICML), 2022
17 May 2022
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers"
28 / 28 papers shown
A Capacity-Based Rationale for Multi-Head Attention
Micah Adler
249
0
0
26 Sep 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
290
7
0
06 Apr 2025
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Neural Information Processing Systems (NeurIPS), 2024
Yunzhe Hu
Difan Zou
Dong Xu
420
4
0
26 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Neural Information Processing Systems (NeurIPS), 2024
Xuechen Zhang
Xiangyu Chang
Mingchen Li
Amit K. Roy-Chowdhury
Jiasi Chen
Samet Oymak
299
19
0
19 Nov 2024
Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization
Prateek Varshney
Mert Pilanci
472
0
0
09 Oct 2024
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
239
18
0
19 Jun 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
452
8
0
24 May 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
533
31
0
08 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Neural Information Processing Systems (NeurIPS), 2024
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
284
29
0
06 Feb 2024
Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
International Conference on Machine Learning (ICML), 2024
Sungyoon Kim
Mert Pilanci
588
10
0
06 Feb 2024
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models
Tolga Ergen
Mert Pilanci
244
7
0
19 Dec 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
379
47
0
19 Oct 2023
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs
Neural Information Processing Systems (NeurIPS), 2023
Rajat Vadiraj Dwaraknath
Tolga Ergen
Mert Pilanci
337
0
0
26 Sep 2023
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
528
66
0
31 Aug 2023
Max-Margin Token Selection in Attention Mechanism
Neural Information Processing Systems (NeurIPS), 2023
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
608
61
0
23 Jun 2023
On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
301
8
0
18 Jun 2023
On the Role of Attention in Prompt-tuning
International Conference on Machine Learning (ICML), 2023
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLT
LRM
267
66
0
06 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers
International Conference on Learning Representations (ICLR), 2023
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
596
33
0
03 Jun 2023
Understanding MLP-Mixer as a Wide and Sparse MLP
International Conference on Machine Learning (ICML), 2023
Tomohiro Hayase
Ryo Karakida
MoE
435
8
0
02 Jun 2023
Multiscale Attention via Wavelet Neural Operators for Vision Transformers
Anahita Nekoozadeh
M. Ahmadzadeh
Zahra Mardani
ViT
279
3
0
22 Mar 2023
Globally Optimal Training of Neural Networks with Threshold Activation Functions
International Conference on Learning Representations (ICLR), 2023
Tolga Ergen
Halil Ibrahim Gulluk
Jonathan Lacotte
Mert Pilanci
352
10
0
06 Mar 2023
Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2022
Matthew Walmer
Saksham Suri
Kamal Gupta
Abhinav Shrivastava
462
43
0
07 Dec 2022
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
260
15
0
20 Nov 2022
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization
International Conference on Machine Learning (ICML), 2022
Mudit Gaur
Vaneet Aggarwal
Mridul Agarwal
MLT
465
3
0
14 Nov 2022
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
International Conference on Machine Learning (ICML), 2022
Aaron Mishkin
Arda Sahiner
Mert Pilanci
OffRL
648
35
0
02 Feb 2022
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks
Tolga Ergen
Mert Pilanci
507
21
0
18 Oct 2021
Parallel Deep Neural Networks Have Zero Duality Gap
Yifei Wang
Tolga Ergen
Mert Pilanci
481
12
0
13 Oct 2021
On the Relationship between Self-Attention and Convolutional Layers
International Conference on Learning Representations (ICLR), 2019
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
809
625
0
08 Nov 2019
1
Page 1 of 1