Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method

Neural Information Processing Systems (NeurIPS), 2021

29 October 2021

Heng Ji

Papers citing "Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method"

48 / 48 papers shown

Title
Connecting Domains and Contrasting Samples: A Ladder for Domain GeneralizationKnowledge Discovery and Data Mining (KDD), 2025 Tianxin Wei Yifan Chen Xinrui He Wenxuan Bao Jingrui He 143 4 0 19 Oct 2025
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport Ashkan Shahbazi Chayne Thrash Yikun Bai Keaton Hamm Navid Naderializadeh Soheil Kolouri 96 1 0 27 Sep 2025
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts Cheng Li Jiexiong Liu Yixuan Chen Jie ji MoE 66 0 0 05 Sep 2025
Revisiting associative recall in modern recurrent models Destiny Okpekpe Antonio Orvieto 84 0 0 26 Aug 2025
PGKET: A Photonic Gaussian Kernel Enhanced Transformer Ren-Xin Zhao ViT 109 0 0 25 Jul 2025
A unified framework for establishing the universal approximation of transformer-type architectures Jingpu Cheng T. Lin Zuowei Shen Qianxiao Li 101 0 0 30 Jun 2025
A Framework for Non-Linear Attention via Modern Hopfield Networks Ahmed Farooq 95 0 0 21 May 2025
Embedding Empirical Distributions for Computing Optimal Transport MapsInternational Symposium on Information Theory (ISIT), 2025 Mingchen Jiang Peng Xu Xichen Ye Xiaohui Chen Yun Yang Yifan Chen OT 272 1 0 24 Apr 2025
Riemannian Optimization on Relaxed Indicator Matrix Manifold Jinghui Yuan Fangyuan Xie Feiping Nie Xuelong Li 295 2 0 26 Mar 2025
Fixed-Point RNNs: Interpolating from Diagonal to Dense Sajad Movahedi Felix Sarnthein Nicola Muca Cirone Antonio Orvieto 382 5 0 13 Mar 2025
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual RestorationKnowledge Discovery and Data Mining (KDD), 2025 Mengting Ai Tianxin Wei Yifan Chen Zhichen Zeng Ritchie Zhao G. Varatkar B. Rouhani Xianfeng Tang Hanghang Tong Jingrui He MoE 259 9 0 10 Mar 2025
Encryption-Friendly LLM ArchitectureInternational Conference on Learning Representations (ICLR), 2024 Donghwan Rho Taeseong Kim Minje Park Jung Woo Kim Hyunsik Chae Jung Hee Cheon Ernest K. Ryu 460 17 0 24 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision TransformersInternational Conference on Learning Representations (ICLR), 2025 Weikang Meng Yadan Luo Xin Li Shihong Deng Zheng Zhang 984 29 0 25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k AttentionInternational Conference on Learning Representations (ICLR), 2025 Qiuhao Zeng Jerry Huang Peng Lu Gezheng Xu Boxing Chen Charles Ling Boyu Wang 520 5 0 24 Jan 2025
Unraveling the Gradient Descent Dynamics of TransformersNeural Information Processing Systems (NeurIPS), 2024 Bingqing Song Boran Han Shuai Zhang Jie Ding Mingyi Hong AI4CE 258 7 0 12 Nov 2024
Kernel Approximation using Analog In-Memory Computing Julian Büchel Giacomo Camposampiero A. Vasilopoulos Corey Lammie Corey Lammie Abbas Rahimi Abu Sebastian 173 0 0 05 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs Nursena Köprücü Destiny Okpekpe Antonio Orvieto Mamba 175 2 0 31 Oct 2024
Towards LifeSpan Cognitive Systems Yu Wang Chi Han Tongtong Wu Xiaoxin He Wangchunshu Zhou ... Zexue He Wei Wang Gholamreza Haffari Heng Ji Julian McAuley KELM CLL 949 8 0 20 Sep 2024
Expanding Expressivity in Transformer Models with MöbiusAttention Anna-Maria Halacheva M. Nayyeri Steffen Staab 183 1 0 08 Sep 2024
ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement Eashan Adhikarla Kai Zhang John Nicholson Brian D. Davison Mamba 230 10 0 19 Aug 2024
Spectraformer: A Unified Random Feature Framework for Transformer Duke Nguyen Du Yin Aditya Joshi Flora D. Salim 232 2 0 24 May 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models Dennis Wu Jerry Yao-Chieh Hu Teng-Yun Hsiao Han Liu 332 39 0 04 Apr 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Michael Zhang Kush S. Bhatia Hermann Kumbong Christopher Ré 193 82 0 06 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 466 3 0 01 Feb 2024
Linear Log-Normal Attention with Unbiased ConcentrationInternational Conference on Learning Representations (ICLR), 2023 Yury Nahshan Dor-Joseph Kampeas E. Haleva 222 9 0 22 Nov 2023
Manifold-Preserving Transformers are Effective for Short-Long Range EncodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Ayan Sengupta Md. Shad Akhtar Tanmoy Chakraborty 127 0 0 22 Oct 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023 Chi Han Qifan Wang Yuan Yao Wenhan Xiong Yu Chen Heng Ji Sinong Wang 468 94 0 30 Aug 2023
Exploring Transformer ExtrapolationAAAI Conference on Artificial Intelligence (AAAI), 2023 Zhen Qin Yiran Zhong Huiyuan Deng 115 12 0 19 Jul 2023
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention Ahan Gupta Hao Guo Yueming Yuan Yan-Quan Zhou Charith Mendis 127 4 0 27 Jun 2023
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal RepresentationNeural Information Processing Systems (NeurIPS), 2023 Yingyi Chen Qinghua Tao F. Tonin Johan A. K. Suykens 233 30 0 31 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels Alexander Moreno Jonathan Mei Luke Walters 178 0 0 15 May 2023
Toeplitz Neural Network for Sequence ModelingInternational Conference on Learning Representations (ICLR), 2023 Zhen Qin Xiaodong Han Weixuan Sun Bowen He Dong Li Dongxu Li Yuchao Dai Lingpeng Kong Yiran Zhong AI4TS ViT 147 46 0 08 May 2023
In-Context Learning with Many Demonstration Examples Mukai Li Shansan Gong Jiangtao Feng Yiheng Xu Jinchao Zhang Zhiyong Wu Lingpeng Kong 222 42 0 09 Feb 2023
KDEformer: Accelerating Transformers via Kernel Density EstimationInternational Conference on Machine Learning (ICML), 2023 A. Zandieh Insu Han Majid Daliri Amin Karbasi 310 52 0 05 Feb 2023
TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Modality Yinsong Wang Shahin Shahrampour 175 0 0 04 Feb 2023
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank AttentionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022 Bosheng Qin Juncheng Li Siliang Tang Yueting Zhuang 130 4 0 24 Nov 2022
Inducer-tuning: Connecting Prefix-tuning and Adapter-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Yifan Chen Devamanyu Hazarika Mahdi Namazifar Yang Liu Di Jin Dilek Z. Hakkani-Tür 126 4 0 26 Oct 2022
The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Zhen Qin Xiaodong Han Weixuan Sun Dongxu Li Lingpeng Kong Nick Barnes Yiran Zhong 171 92 0 19 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingInternational Conference on Machine Learning (ICML), 2022 Jinchao Zhang Shuyang Jiang Jiangtao Feng Lin Zheng Dianbo Sui 3DV 556 9 0 14 Oct 2022
Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from SymmetriesJournal of Machine Learning (JML), 2022 Chao Ma Lexing Ying 138 2 0 13 Oct 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier LayersSignal, Image and Video Processing (SIVP), 2022 Nurullah Sevim Ege Ozan Özyedek Furkan Şahinuç Aykut Koç 179 15 0 26 Sep 2022
On The Computational Complexity of Self-AttentionInternational Conference on Algorithmic Learning Theory (ALT), 2022 Feyza Duman Keles Pruthuvi Maheshakya Wijewardena Chinmay Hegde 277 214 0 11 Sep 2022
Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks Yifan Chen Tianning Xu Dilek Z. Hakkani-Tür Di Jin Yun Yang Ruoqing Zhu 296 5 0 01 Jun 2022
KERPLE: Kernelized Relative Positional Embedding for Length ExtrapolationNeural Information Processing Systems (NeurIPS), 2022 Ta-Chung Chi Ting-Han Fan Peter J. Ramadge Alexander I. Rudnicky 271 87 0 20 May 2022
Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention Yifan Chen Devamanyu Hazarika Mahdi Namazifar Yang Liu Di Jin Dilek Z. Hakkani-Tür 117 8 0 07 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes Derya Soydaner 3DV 247 273 0 27 Apr 2022
Linear Complexity Randomized Self-attention MechanismInternational Conference on Machine Learning (ICML), 2022 Lin Zheng Chong-Jun Wang Lingpeng Kong 150 34 0 10 Apr 2022
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences Yifan Chen Qi Zeng Dilek Z. Hakkani-Tür Di Jin Heng Ji Yun Yang 133 6 0 10 Dec 2021

All Papers

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method

Papers citing "Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method"