ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11052
  4. Cited By
Convexifying Transformers: Improving optimization and understanding of
  transformer networks

Convexifying Transformers: Improving optimization and understanding of transformer networks

20 November 2022
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
    MLT
ArXivPDFHTML

Papers citing "Convexifying Transformers: Improving optimization and understanding of transformer networks"

8 / 8 papers shown
Title
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
34
13
0
08 Feb 2024
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
Aaron Mishkin
Arda Sahiner
Mert Pilanci
OffRL
77
30
0
02 Feb 2022
Path Regularization: A Convexity and Sparsity Inducing Regularization
  for Parallel ReLU Networks
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks
Tolga Ergen
Mert Pilanci
29
16
0
18 Oct 2021
Parallel Deep Neural Networks Have Zero Duality Gap
Parallel Deep Neural Networks Have Zero Duality Gap
Yifei Wang
Tolga Ergen
Mert Pilanci
79
10
0
13 Oct 2021
Is Attention Better Than Matrix Decomposition?
Is Attention Better Than Matrix Decomposition?
Zhengyang Geng
Meng-Hao Guo
Hongxu Chen
Xia Li
Ke Wei
Zhouchen Lin
59
137
0
09 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
271
2,603
0
04 May 2021
Fourier Neural Operator for Parametric Partial Differential Equations
Fourier Neural Operator for Parametric Partial Differential Equations
Zong-Yi Li
Nikola B. Kovachki
Kamyar Azizzadenesheli
Burigede Liu
K. Bhattacharya
Andrew M. Stuart
Anima Anandkumar
AI4CE
217
2,287
0
18 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1