ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04710
  4. Cited By
The Lipschitz Constant of Self-Attention

The Lipschitz Constant of Self-Attention

8 June 2020
Hyunjik Kim
George Papamakarios
A. Mnih
ArXivPDFHTML

Papers citing "The Lipschitz Constant of Self-Attention"

23 / 23 papers shown
Title
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Xiaoyu Xu
Minxin Du
Qingqing Ye
Haibo Hu
MU
54
0
0
07 May 2025
Scaled Supervision is an Implicit Lipschitz Regularizer
Scaled Supervision is an Implicit Lipschitz Regularizer
Z. Ouyang
Chunhui Zhang
Yaning Jia
Soroush Vosoughi
BDL
OffRL
72
0
0
19 Mar 2025
Activating Self-Attention for Multi-Scene Absolute Pose Regression
Activating Self-Attention for Multi-Scene Absolute Pose Regression
Miso Lee
Jihwan Kim
Jae-Pil Heo
ViT
31
0
0
03 Nov 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
40
9
0
29 Aug 2024
Attention Beats Linear for Fast Implicit Neural Representation
  Generation
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
42
1
0
22 Jul 2024
Imitation Learning Inputting Image Feature to Each Layer of Neural
  Network
Imitation Learning Inputting Image Feature to Each Layer of Neural Network
Koki Yamane
S. Sakaino
T. Tsuji
27
3
0
18 Jan 2024
How Smooth Is Attention?
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
23
20
0
16 Oct 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
35
56
0
12 Feb 2023
Mutual Wasserstein Discrepancy Minimization for Sequential
  Recommendation
Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation
Ziwei Fan
Zhiwei Liu
Hao Peng
Philip S Yu
27
15
0
28 Jan 2023
Graph Neural Network Based Node Deployment for Throughput Enhancement
Graph Neural Network Based Node Deployment for Throughput Enhancement
Yifei Yang
Dongmian Zou
Xiaofan He
13
5
0
19 Aug 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
35
5
0
12 Jul 2022
Automated Progressive Learning for Efficient Training of Vision
  Transformers
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
20
46
0
28 Mar 2022
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Ricky T. Q. Chen
Brandon Amos
Maximilian Nickel
24
10
0
14 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain
  Analysis: From Theory to Practice
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zhangyang Wang
ViT
22
127
0
09 Mar 2022
Sinkformers: Transformers with Doubly Stochastic Attention
Sinkformers: Transformers with Doubly Stochastic Attention
Michael E. Sander
Pierre Ablin
Mathieu Blondel
Gabriel Peyré
27
76
0
22 Oct 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
115
0
19 Oct 2021
Existence, Stability and Scalability of Orthogonal Convolutional Neural
  Networks
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks
E. M. Achour
Franccois Malgouyres
Franck Mamalet
16
20
0
12 Aug 2021
Rethinking Space-Time Networks with Improved Memory Coverage for
  Efficient Video Object Segmentation
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Ho Kei Cheng
Yu-Wing Tai
Chi-Keung Tang
VOS
35
278
0
09 Jun 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs,
  Normalizing Flows, Energy-Based and Autoregressive Models
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models
Sam Bond-Taylor
Adam Leach
Yang Long
Chris G. Willcocks
VLM
TPM
36
478
0
08 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
32
373
0
05 Mar 2021
A case for new neural network smoothness constraints
A case for new neural network smoothness constraints
Mihaela Rosca
T. Weber
A. Gretton
S. Mohamed
AAML
25
48
0
14 Dec 2020
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
1