The Lipschitz Constant of Self-Attention

8 June 2020

Papers citing "The Lipschitz Constant of Self-Attention"

23 / 23 papers shown

Title
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models Xiaoyu Xu Minxin Du Qingqing Ye Haibo Hu MU 54 0 0 07 May 2025
Scaled Supervision is an Implicit Lipschitz Regularizer Z. Ouyang Chunhui Zhang Yaning Jia Soroush Vosoughi BDL OffRL 72 0 0 19 Mar 2025
Activating Self-Attention for Multi-Scene Absolute Pose Regression Miso Lee Jihwan Kim Jae-Pil Heo ViT 31 0 0 03 Nov 2024
Law of Vision Representation in MLLMs Shijia Yang Bohan Zhai Quanzeng You Jianbo Yuan Hongxia Yang Chenfeng Xu 40 9 0 29 Aug 2024
Attention Beats Linear for Fast Implicit Neural Representation Generation Shuyi Zhang Ke Liu Jingjun Gu Xiaoxu Cai Zhihua Wang Jiajun Bu Haishuai Wang 42 1 0 22 Jul 2024
Imitation Learning Inputting Image Feature to Each Layer of Neural Network Koki Yamane S. Sakaino T. Tsuji 27 3 0 18 Jan 2024
How Smooth Is Attention? Valérie Castin Pierre Ablin Gabriel Peyré AAML 40 9 0 22 Dec 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate Kai Lv Hang Yan Qipeng Guo Haijun Lv Xipeng Qiu ODL 23 20 0 16 Oct 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity Hongkang Li M. Wang Sijia Liu Pin-Yu Chen ViT MLT 35 56 0 12 Feb 2023
Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation Ziwei Fan Zhiwei Liu Hao Peng Philip S Yu 27 15 0 28 Jan 2023
Graph Neural Network Based Node Deployment for Throughput Enhancement Yifei Yang Dongmian Zou Xiaofan He 13 5 0 19 Aug 2022
Online Video Instance Segmentation via Robust Context Fusion Xiang Li Jinglu Wang Xiaohao Xu Bhiksha Raj Yan Lu 35 5 0 12 Jul 2022
Automated Progressive Learning for Efficient Training of Vision Transformers Changlin Li Bohan Zhuang Guangrun Wang Xiaodan Liang Xiaojun Chang Yi Yang 20 46 0 28 Mar 2022
Semi-Discrete Normalizing Flows through Differentiable Tessellation Ricky T. Q. Chen Brandon Amos Maximilian Nickel 24 10 0 14 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice Peihao Wang Wenqing Zheng Tianlong Chen Zhangyang Wang ViT 22 127 0 09 Mar 2022
Sinkformers: Transformers with Doubly Stochastic Attention Michael E. Sander Pierre Ablin Mathieu Blondel Gabriel Peyré 27 76 0 22 Oct 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms Benjamin L. Edelman Surbhi Goel Sham Kakade Cyril Zhang 27 115 0 19 Oct 2021
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks E. M. Achour Franccois Malgouyres Franck Mamalet 16 20 0 12 Aug 2021
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng Yu-Wing Tai Chi-Keung Tang VOS 35 278 0 09 Jun 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models Sam Bond-Taylor Adam Leach Yang Long Chris G. Willcocks VLM TPM 36 478 0 08 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth Yihe Dong Jean-Baptiste Cordonnier Andreas Loukas 32 373 0 05 Mar 2021
A case for new neural network smoothness constraints Mihaela Rosca T. Weber A. Gretton S. Mohamed AAML 25 48 0 14 Dec 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling S. Bhattamishra Arkil Patel Navin Goyal 25 63 0 16 Jun 2020