v1v2 (latest)

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

International Conference on Machine Learning (ICML), 2022

17 May 2022

Arda Sahiner

Tolga Ergen

Batu Mehmet Ozturkler

John M. Pauly

Morteza Mardani

Mert Pilanci

ArXiv (abs)PDF HTML Github

Papers citing "Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers"

28 / 28 papers shown

A Capacity-Based Rationale for Multi-Head Attention

Micah Adler

249

26 Sep 2025

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Yingcong Li

Davoud Ataee Tarzanagh

A. S. Rawat

Maryam Fazel

Samet Oymak

290

06 Apr 2025

An In-depth Investigation of Sparse Rate Reduction in Transformer-like ModelsNeural Information Processing Systems (NeurIPS), 2024

Yunzhe Hu

Difan Zou

Dong Xu

420

26 Nov 2024

Selective Attention: Enhancing Transformer through Principled Context ControlNeural Information Processing Systems (NeurIPS), 2024

Xuechen Zhang

Xiangyu Chang

Mingchen Li

Amit K. Roy-Chowdhury

Jiasi Chen

Samet Oymak

299

19 Nov 2024

Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization

Prateek Varshney

Mert Pilanci

472

09 Oct 2024

A Primal-Dual Framework for Transformers and Neural Networks

Tan M. Nguyen

Tam Nguyen

239

19 Jun 2024

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

452

24 May 2024

Implicit Bias and Fast Convergence Rates for Self-attention

Bhavya Vasudeva

Puneesh Deora

Christos Thrampoulidis

533

08 Feb 2024

A phase transition between positional and semantic learning in a solvable model of dot-product attentionNeural Information Processing Systems (NeurIPS), 2024

Lenka Zdeborová

284

06 Feb 2024

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial TimeInternational Conference on Machine Learning (ICML), 2024

Sungyoon Kim

Mert Pilanci

588

06 Feb 2024

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

Tolga Ergen

Mert Pilanci

244

19 Dec 2023

On the Optimization and Generalization of Multi-head Attention

Puneesh Deora

Rouzbeh Ghaderi

Hossein Taheri

Christos Thrampoulidis

MLT

379

19 Oct 2023

Fixing the NTK: From Neural Network Linearizations to Exact Convex ProgramsNeural Information Processing Systems (NeurIPS), 2023

Rajat Vadiraj Dwaraknath

Tolga Ergen

Mert Pilanci

337

26 Sep 2023

Transformers as Support Vector Machines

Davoud Ataee Tarzanagh

Yingcong Li

Christos Thrampoulidis

Samet Oymak

528

31 Aug 2023

Max-Margin Token Selection in Attention MechanismNeural Information Processing Systems (NeurIPS), 2023

Davoud Ataee Tarzanagh

Yingcong Li

Xuechen Zhang

Samet Oymak

608

23 Jun 2023

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

301

18 Jun 2023

On the Role of Attention in Prompt-tuningInternational Conference on Machine Learning (ICML), 2023

Samet Oymak

A. S. Rawat

Mahdi Soltanolkotabi

Christos Thrampoulidis

MLT LRM

267

06 Jun 2023

Memorization Capacity of Multi-Head Attention in TransformersInternational Conference on Learning Representations (ICLR), 2023

Sadegh Mahdavi

Renjie Liao

Christos Thrampoulidis

596

03 Jun 2023

Understanding MLP-Mixer as a Wide and Sparse MLPInternational Conference on Machine Learning (ICML), 2023

Tomohiro Hayase

Ryo Karakida

MoE

435

02 Jun 2023

Multiscale Attention via Wavelet Neural Operators for Vision Transformers

279

22 Mar 2023

Globally Optimal Training of Neural Networks with Threshold Activation FunctionsInternational Conference on Learning Representations (ICLR), 2023

352

06 Mar 2023

Teaching Matters: Investigating the Role of Supervision in Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022

462

07 Dec 2022

Convexifying Transformers: Improving optimization and understanding of transformer networks

260

20 Nov 2022

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network ParametrizationInternational Conference on Machine Learning (ICML), 2022

465

14 Nov 2022

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone DecompositionsInternational Conference on Machine Learning (ICML), 2022

648

02 Feb 2022

Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

Tolga Ergen

Mert Pilanci

507

18 Oct 2021

Parallel Deep Neural Networks Have Zero Duality Gap

Yifei Wang

Tolga Ergen

Mert Pilanci

481

13 Oct 2021

On the Relationship between Self-Attention and Convolutional LayersInternational Conference on Learning Representations (ICLR), 2019

Jean-Baptiste Cordonnier

Andreas Loukas

Martin Jaggi

809

625

08 Nov 2019