ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.08078
  4. Cited By
Unraveling Attention via Convex Duality: Analysis and Interpretations of
  Vision Transformers
v1v2 (latest)

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

International Conference on Machine Learning (ICML), 2022
17 May 2022
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
ArXiv (abs)PDFHTMLGithub

Papers citing "Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers"

28 / 28 papers shown
A Capacity-Based Rationale for Multi-Head Attention
A Capacity-Based Rationale for Multi-Head Attention
Micah Adler
249
0
0
26 Sep 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
290
7
0
06 Apr 2025
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models
An In-depth Investigation of Sparse Rate Reduction in Transformer-like ModelsNeural Information Processing Systems (NeurIPS), 2024
Yunzhe Hu
Difan Zou
Dong Xu
420
4
0
26 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context
  Control
Selective Attention: Enhancing Transformer through Principled Context ControlNeural Information Processing Systems (NeurIPS), 2024
Xuechen Zhang
Xiangyu Chang
Mingchen Li
Amit K. Roy-Chowdhury
Jiasi Chen
Samet Oymak
299
19
0
19 Nov 2024
Convex Distillation: Efficient Compression of Deep Networks via Convex
  Optimization
Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization
Prateek Varshney
Mert Pilanci
472
0
0
09 Oct 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
239
18
0
19 Jun 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics
  Theory of Transformers
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
452
8
0
24 May 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
533
31
0
08 Feb 2024
A phase transition between positional and semantic learning in a
  solvable model of dot-product attention
A phase transition between positional and semantic learning in a solvable model of dot-product attentionNeural Information Processing Systems (NeurIPS), 2024
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
284
29
0
06 Feb 2024
Convex Relaxations of ReLU Neural Networks Approximate Global Optima in
  Polynomial Time
Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial TimeInternational Conference on Machine Learning (ICML), 2024
Sungyoon Kim
Mert Pilanci
588
10
0
06 Feb 2024
The Convex Landscape of Neural Networks: Characterizing Global Optima
  and Stationary Points via Lasso Models
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models
Tolga Ergen
Mert Pilanci
244
7
0
19 Dec 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
379
47
0
19 Oct 2023
Fixing the NTK: From Neural Network Linearizations to Exact Convex
  Programs
Fixing the NTK: From Neural Network Linearizations to Exact Convex ProgramsNeural Information Processing Systems (NeurIPS), 2023
Rajat Vadiraj Dwaraknath
Tolga Ergen
Mert Pilanci
337
0
0
26 Sep 2023
Transformers as Support Vector Machines
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
528
66
0
31 Aug 2023
Max-Margin Token Selection in Attention Mechanism
Max-Margin Token Selection in Attention MechanismNeural Information Processing Systems (NeurIPS), 2023
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
608
61
0
23 Jun 2023
On the Global Convergence of Natural Actor-Critic with Two-layer Neural
  Network Parametrization
On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
301
8
0
18 Jun 2023
On the Role of Attention in Prompt-tuning
On the Role of Attention in Prompt-tuningInternational Conference on Machine Learning (ICML), 2023
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLTLRM
267
66
0
06 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers
Memorization Capacity of Multi-Head Attention in TransformersInternational Conference on Learning Representations (ICLR), 2023
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
596
33
0
03 Jun 2023
Understanding MLP-Mixer as a Wide and Sparse MLP
Understanding MLP-Mixer as a Wide and Sparse MLPInternational Conference on Machine Learning (ICML), 2023
Tomohiro Hayase
Ryo Karakida
MoE
435
8
0
02 Jun 2023
Multiscale Attention via Wavelet Neural Operators for Vision
  Transformers
Multiscale Attention via Wavelet Neural Operators for Vision Transformers
Anahita Nekoozadeh
M. Ahmadzadeh
Zahra Mardani
ViT
279
3
0
22 Mar 2023
Globally Optimal Training of Neural Networks with Threshold Activation
  Functions
Globally Optimal Training of Neural Networks with Threshold Activation FunctionsInternational Conference on Learning Representations (ICLR), 2023
Tolga Ergen
Halil Ibrahim Gulluk
Jonathan Lacotte
Mert Pilanci
352
10
0
06 Mar 2023
Teaching Matters: Investigating the Role of Supervision in Vision
  Transformers
Teaching Matters: Investigating the Role of Supervision in Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Matthew Walmer
Saksham Suri
Kamal Gupta
Abhinav Shrivastava
462
43
0
07 Dec 2022
Convexifying Transformers: Improving optimization and understanding of
  transformer networks
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
260
15
0
20 Nov 2022
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural
  Network Parametrization
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network ParametrizationInternational Conference on Machine Learning (ICML), 2022
Mudit Gaur
Vaneet Aggarwal
Mridul Agarwal
MLT
465
3
0
14 Nov 2022
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone DecompositionsInternational Conference on Machine Learning (ICML), 2022
Aaron Mishkin
Arda Sahiner
Mert Pilanci
OffRL
648
35
0
02 Feb 2022
Path Regularization: A Convexity and Sparsity Inducing Regularization
  for Parallel ReLU Networks
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks
Tolga Ergen
Mert Pilanci
507
21
0
18 Oct 2021
Parallel Deep Neural Networks Have Zero Duality Gap
Parallel Deep Neural Networks Have Zero Duality Gap
Yifei Wang
Tolga Ergen
Mert Pilanci
481
12
0
13 Oct 2021
On the Relationship between Self-Attention and Convolutional Layers
On the Relationship between Self-Attention and Convolutional LayersInternational Conference on Learning Representations (ICLR), 2019
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
809
625
0
08 Nov 2019
1
Page 1 of 1