Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.04710
Cited By
The Lipschitz Constant of Self-Attention
8 June 2020
Hyunjik Kim
George Papamakarios
A. Mnih
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Lipschitz Constant of Self-Attention"
23 / 23 papers shown
Title
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Xiaoyu Xu
Minxin Du
Qingqing Ye
Haibo Hu
MU
54
0
0
07 May 2025
Scaled Supervision is an Implicit Lipschitz Regularizer
Z. Ouyang
Chunhui Zhang
Yaning Jia
Soroush Vosoughi
BDL
OffRL
72
0
0
19 Mar 2025
Activating Self-Attention for Multi-Scene Absolute Pose Regression
Miso Lee
Jihwan Kim
Jae-Pil Heo
ViT
31
0
0
03 Nov 2024
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
40
9
0
29 Aug 2024
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
42
1
0
22 Jul 2024
Imitation Learning Inputting Image Feature to Each Layer of Neural Network
Koki Yamane
S. Sakaino
T. Tsuji
27
3
0
18 Jan 2024
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
23
20
0
16 Oct 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
35
56
0
12 Feb 2023
Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation
Ziwei Fan
Zhiwei Liu
Hao Peng
Philip S Yu
27
15
0
28 Jan 2023
Graph Neural Network Based Node Deployment for Throughput Enhancement
Yifei Yang
Dongmian Zou
Xiaofan He
13
5
0
19 Aug 2022
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
35
5
0
12 Jul 2022
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
20
46
0
28 Mar 2022
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Ricky T. Q. Chen
Brandon Amos
Maximilian Nickel
24
10
0
14 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zhangyang Wang
ViT
22
127
0
09 Mar 2022
Sinkformers: Transformers with Doubly Stochastic Attention
Michael E. Sander
Pierre Ablin
Mathieu Blondel
Gabriel Peyré
27
76
0
22 Oct 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
115
0
19 Oct 2021
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks
E. M. Achour
Franccois Malgouyres
Franck Mamalet
16
20
0
12 Aug 2021
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Ho Kei Cheng
Yu-Wing Tai
Chi-Keung Tang
VOS
35
278
0
09 Jun 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models
Sam Bond-Taylor
Adam Leach
Yang Long
Chris G. Willcocks
VLM
TPM
36
478
0
08 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
32
373
0
05 Mar 2021
A case for new neural network smoothness constraints
Mihaela Rosca
T. Weber
A. Gretton
S. Mohamed
AAML
25
48
0
14 Dec 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
1