Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2111.00035
Cited By
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method
Neural Information Processing Systems (NeurIPS), 2021
29 October 2021
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method"
48 / 48 papers shown
Title
Connecting Domains and Contrasting Samples: A Ladder for Domain Generalization
Knowledge Discovery and Data Mining (KDD), 2025
Tianxin Wei
Yifan Chen
Xinrui He
Wenxuan Bao
Jingrui He
143
4
0
19 Oct 2025
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport
Ashkan Shahbazi
Chayne Thrash
Yikun Bai
Keaton Hamm
Navid Naderializadeh
Soheil Kolouri
96
1
0
27 Sep 2025
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
Cheng Li
Jiexiong Liu
Yixuan Chen
Jie ji
MoE
66
0
0
05 Sep 2025
Revisiting associative recall in modern recurrent models
Destiny Okpekpe
Antonio Orvieto
84
0
0
26 Aug 2025
PGKET: A Photonic Gaussian Kernel Enhanced Transformer
Ren-Xin Zhao
ViT
109
0
0
25 Jul 2025
A unified framework for establishing the universal approximation of transformer-type architectures
Jingpu Cheng
T. Lin
Zuowei Shen
Qianxiao Li
101
0
0
30 Jun 2025
A Framework for Non-Linear Attention via Modern Hopfield Networks
Ahmed Farooq
95
0
0
21 May 2025
Embedding Empirical Distributions for Computing Optimal Transport Maps
International Symposium on Information Theory (ISIT), 2025
Mingchen Jiang
Peng Xu
Xichen Ye
Xiaohui Chen
Yun Yang
Yifan Chen
OT
272
1
0
24 Apr 2025
Riemannian Optimization on Relaxed Indicator Matrix Manifold
Jinghui Yuan
Fangyuan Xie
Feiping Nie
Xuelong Li
295
2
0
26 Mar 2025
Fixed-Point RNNs: Interpolating from Diagonal to Dense
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
382
5
0
13 Mar 2025
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration
Knowledge Discovery and Data Mining (KDD), 2025
Mengting Ai
Tianxin Wei
Yifan Chen
Zhichen Zeng
Ritchie Zhao
G. Varatkar
B. Rouhani
Xianfeng Tang
Hanghang Tong
Jingrui He
MoE
259
9
0
10 Mar 2025
Encryption-Friendly LLM Architecture
International Conference on Learning Representations (ICLR), 2024
Donghwan Rho
Taeseong Kim
Minje Park
Jung Woo Kim
Hyunsik Chae
Jung Hee Cheon
Ernest K. Ryu
460
17
0
24 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
International Conference on Learning Representations (ICLR), 2025
Weikang Meng
Yadan Luo
Xin Li
Shihong Deng
Zheng Zhang
984
29
0
25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
International Conference on Learning Representations (ICLR), 2025
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
520
5
0
24 Jan 2025
Unraveling the Gradient Descent Dynamics of Transformers
Neural Information Processing Systems (NeurIPS), 2024
Bingqing Song
Boran Han
Shuai Zhang
Jie Ding
Mingyi Hong
AI4CE
258
7
0
12 Nov 2024
Kernel Approximation using Analog In-Memory Computing
Julian Büchel
Giacomo Camposampiero
A. Vasilopoulos
Corey Lammie
Corey Lammie
Abbas Rahimi
Abu Sebastian
173
0
0
05 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
175
2
0
31 Oct 2024
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
949
8
0
20 Sep 2024
Expanding Expressivity in Transformer Models with MöbiusAttention
Anna-Maria Halacheva
M. Nayyeri
Steffen Staab
183
1
0
08 Sep 2024
ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement
Eashan Adhikarla
Kai Zhang
John Nicholson
Brian D. Davison
Mamba
230
10
0
19 Aug 2024
Spectraformer: A Unified Random Feature Framework for Transformer
Duke Nguyen
Du Yin
Aditya Joshi
Flora D. Salim
232
2
0
24 May 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Dennis Wu
Jerry Yao-Chieh Hu
Teng-Yun Hsiao
Han Liu
332
39
0
04 Apr 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
193
82
0
06 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
466
3
0
01 Feb 2024
Linear Log-Normal Attention with Unbiased Concentration
International Conference on Learning Representations (ICLR), 2023
Yury Nahshan
Dor-Joseph Kampeas
E. Haleva
222
9
0
22 Nov 2023
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ayan Sengupta
Md. Shad Akhtar
Tanmoy Chakraborty
127
0
0
22 Oct 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Chi Han
Qifan Wang
Yuan Yao
Wenhan Xiong
Yu Chen
Heng Ji
Sinong Wang
468
94
0
30 Aug 2023
Exploring Transformer Extrapolation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhen Qin
Yiran Zhong
Huiyuan Deng
115
12
0
19 Jul 2023
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
Ahan Gupta
Hao Guo
Yueming Yuan
Yan-Quan Zhou
Charith Mendis
127
4
0
27 Jun 2023
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Neural Information Processing Systems (NeurIPS), 2023
Yingyi Chen
Qinghua Tao
F. Tonin
Johan A. K. Suykens
233
30
0
31 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
178
0
0
15 May 2023
Toeplitz Neural Network for Sequence Modeling
International Conference on Learning Representations (ICLR), 2023
Zhen Qin
Xiaodong Han
Weixuan Sun
Bowen He
Dong Li
Dongxu Li
Yuchao Dai
Lingpeng Kong
Yiran Zhong
AI4TS
ViT
147
46
0
08 May 2023
In-Context Learning with Many Demonstration Examples
Mukai Li
Shansan Gong
Jiangtao Feng
Yiheng Xu
Jinchao Zhang
Zhiyong Wu
Lingpeng Kong
222
42
0
09 Feb 2023
KDEformer: Accelerating Transformers via Kernel Density Estimation
International Conference on Machine Learning (ICML), 2023
A. Zandieh
Insu Han
Majid Daliri
Amin Karbasi
310
52
0
05 Feb 2023
TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Modality
Yinsong Wang
Shahin Shahrampour
175
0
0
04 Feb 2023
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Bosheng Qin
Juncheng Li
Siliang Tang
Yueting Zhuang
130
4
0
24 Nov 2022
Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yifan Chen
Devamanyu Hazarika
Mahdi Namazifar
Yang Liu
Di Jin
Dilek Z. Hakkani-Tür
126
4
0
26 Oct 2022
The Devil in Linear Transformer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
171
92
0
19 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
International Conference on Machine Learning (ICML), 2022
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Dianbo Sui
3DV
556
9
0
14 Oct 2022
Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries
Journal of Machine Learning (JML), 2022
Chao Ma
Lexing Ying
138
2
0
13 Oct 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Signal, Image and Video Processing (SIVP), 2022
Nurullah Sevim
Ege Ozan Özyedek
Furkan Şahinuç
Aykut Koç
179
15
0
26 Sep 2022
On The Computational Complexity of Self-Attention
International Conference on Algorithmic Learning Theory (ALT), 2022
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
277
214
0
11 Sep 2022
Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks
Yifan Chen
Tianning Xu
Dilek Z. Hakkani-Tür
Di Jin
Yun Yang
Ruoqing Zhu
296
5
0
01 Jun 2022
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Neural Information Processing Systems (NeurIPS), 2022
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
271
87
0
20 May 2022
Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention
Yifan Chen
Devamanyu Hazarika
Mahdi Namazifar
Yang Liu
Di Jin
Dilek Z. Hakkani-Tür
117
8
0
07 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
247
273
0
27 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
International Conference on Machine Learning (ICML), 2022
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
150
34
0
10 Apr 2022
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences
Yifan Chen
Qi Zeng
Dilek Z. Hakkani-Tür
Di Jin
Heng Ji
Yun Yang
133
6
0
10 Dec 2021
1