Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.10670
Cited By
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention
22 April 2022
Tong Yu
Ruslan Khalitov
Lei Cheng
Zhirong Yang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention"
6 / 6 papers shown
Title
Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks
Hangcheng Cao
Longzhi Yuan
Guowen Xu
Ziyang He
Zhengru Fang
Yuguang Fang
AAML
28
2
0
06 Sep 2024
Self-Distillation Improves DNA Sequence Inference
Tong Yu
Lei Cheng
Ruslan Khalitov
Erland Brandser Olsson
Zhirong Yang
SyDa
22
0
0
14 May 2024
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers
Adjorn van Engelenhoven
Nicola Strisciuglio
Estefanía Talavera
13
0
0
06 Feb 2024
FLatten Transformer: Vision Transformer using Focused Linear Attention
Dongchen Han
Xuran Pan
Yizeng Han
Shiji Song
Gao Huang
23
73
0
01 Aug 2023
ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths
Ruslan Khalitov
Tong Yu
Lei Cheng
Zhirong Yang
17
12
0
12 Jun 2022
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
1