ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.08050
  4. Cited By
Pay Attention to MLPs

Pay Attention to MLPs

17 May 2021
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
    AI4CE
ArXivPDFHTML

Papers citing "Pay Attention to MLPs"

50 / 303 papers shown
Title
Adaptive Fourier Neural Operators: Efficient Token Mixers for
  Transformers
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers
John Guibas
Morteza Mardani
Zong-Yi Li
Andrew Tao
Anima Anandkumar
Bryan Catanzaro
19
227
0
24 Nov 2021
MetaFormer Is Actually What You Need for Vision
MetaFormer Is Actually What You Need for Vision
Weihao Yu
Mi Luo
Pan Zhou
Chenyang Si
Yichen Zhou
Xinchao Wang
Jiashi Feng
Shuicheng Yan
26
872
0
22 Nov 2021
PointMixer: MLP-Mixer for Point Cloud Understanding
PointMixer: MLP-Mixer for Point Cloud Understanding
Jaesung Choe
Chunghyun Park
François Rameau
Jaesik Park
In So Kweon
3DPC
32
98
0
22 Nov 2021
Are Transformers More Robust Than CNNs?
Are Transformers More Robust Than CNNs?
Yutong Bai
Jieru Mei
Alan Yuille
Cihang Xie
ViT
AAML
183
258
0
10 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Yinghui Li
Li Tao
Dun Liang
Haitao Zheng
79
96
0
07 Nov 2021
Convolutional Gated MLP: Combining Convolutions & gMLP
Convolutional Gated MLP: Combining Convolutions & gMLP
A. Rajagopal
V. Nirmala
26
14
0
06 Nov 2021
Arbitrary Distribution Modeling with Censorship in Real-Time Bidding
  Advertising
Arbitrary Distribution Modeling with Censorship in Real-Time Bidding Advertising
Xu Li
Michelle Ma Zhang
Youjun Tong
Zhenya Wang
14
9
0
26 Oct 2021
Graph-less Neural Networks: Teaching Old MLPs New Tricks via
  Distillation
Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation
Shichang Zhang
Yozen Liu
Yizhou Sun
Neil Shah
31
173
0
17 Oct 2021
Attention-Free Keyword Spotting
Attention-Free Keyword Spotting
Mashrur M. Morshed
Ahmad Omar Ahsan
25
9
0
14 Oct 2021
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with
  Variable Hidden Dimensions
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions
Vinod Ganesan
Gowtham Ramesh
Pratyush Kumar
31
9
0
10 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and
  MLP
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Jihao Liu
Hongsheng Li
Guanglu Song
Xin Huang
Yu Liu
ViT
29
35
0
08 Oct 2021
Deep Instance Segmentation with Automotive Radar Detection Points
Deep Instance Segmentation with Automotive Radar Detection Points
Jianan Liu
Weiyi Xiong
Liping Bai
Yu Xia
Tao Huang
Wanli Ouyang
Bing Zhu
46
53
0
05 Oct 2021
General Cross-Architecture Distillation of Pretrained Language Models
  into Matrix Embeddings
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
Lukas Galke
Isabelle Cuber
Christophe Meyer
Henrik Ferdinand Nolscher
Angelina Sonderecker
A. Scherp
28
2
0
17 Sep 2021
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
Chuanxin Tang
Yucheng Zhao
Guangting Wang
Chong Luo
Wenxuan Xie
Wenjun Zeng
MoE
ViT
27
98
0
12 Sep 2021
ConvMLP: Hierarchical Convolutional MLPs for Vision
ConvMLP: Hierarchical Convolutional MLPs for Vision
Jiachen Li
Ali Hassani
Steven Walton
Humphrey Shi
35
65
0
09 Sep 2021
Cross-token Modeling with Conditional Computation
Cross-token Modeling with Conditional Computation
Yuxuan Lou
Fuzhao Xue
Zangwei Zheng
Yang You
MoE
22
19
0
05 Sep 2021
SANSformers: Self-Supervised Forecasting in Electronic Health Records
  with Attention-Free Models
SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models
Yogesh Kumar
Alexander Ilin
H. Salo
S. Kulathinal
M. Leinonen
Pekka Marttinen
AI4TS
MedIm
20
0
0
31 Aug 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
38
105
0
30 Aug 2021
A Battle of Network Structures: An Empirical Study of CNN, Transformer,
  and MLP
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP
Yucheng Zhao
Guangting Wang
Chuanxin Tang
Chong Luo
Wenjun Zeng
Zhengjun Zha
26
69
0
30 Aug 2021
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in
  Sequential Recommendation
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation
Hojoon Lee
Dongyoon Hwang
Sunghwan Hong
Changyeon Kim
Seungryong Kim
Jaegul Choo
11
10
0
17 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
  Locality?
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
19
12
0
09 Aug 2021
S$^2$-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
S2^22-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
37
50
0
02 Aug 2021
Structure and Performance of Fully Connected Neural Networks: Emerging
  Complex Network Properties
Structure and Performance of Fully Connected Neural Networks: Emerging Complex Network Properties
Leonardo F. S. Scabini
Odemir M. Bruno
GNN
6
51
0
29 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
19
231
0
21 Jul 2021
AS-MLP: An Axial Shifted MLP Architecture for Vision
AS-MLP: An Axial Shifted MLP Architecture for Vision
Dongze Lian
Zehao Yu
Xing Sun
Shenghua Gao
14
189
0
18 Jul 2021
Visual Transformer with Statistical Test for COVID-19 Classification
Visual Transformer with Statistical Test for COVID-19 Classification
Chih-Chung Hsu
Guan-Lin Chen
Mei-Hsuan Wu
ViT
MedIm
61
15
0
12 Jul 2021
What Makes for Hierarchical Vision Transformer?
What Makes for Hierarchical Vision Transformer?
Yuxin Fang
Xinggang Wang
Rui Wu
Wenyu Liu
ViT
11
9
0
05 Jul 2021
Global Filter Networks for Image Classification
Global Filter Networks for Image Classification
Yongming Rao
Wenliang Zhao
Zheng Zhu
Jiwen Lu
Jie Zhou
ViT
12
450
0
01 Jul 2021
Multi-Exit Vision Transformer for Dynamic Inference
Multi-Exit Vision Transformer for Dynamic Inference
Arian Bakhtiarnia
Qi Zhang
Alexandros Iosifidis
28
26
0
29 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
40
26
0
28 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual
  Recognition
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViT
MLLM
24
205
0
23 Jun 2021
Towards Biologically Plausible Convolutional Networks
Towards Biologically Plausible Convolutional Networks
Roman Pogodin
Yash Mehta
Timothy Lillicrap
P. Latham
26
22
0
22 Jun 2021
MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis
MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis
Jaesung Tae
Hyeongju Kim
Younggun Lee
6
14
0
15 Jun 2021
S$^2$-MLP: Spatial-Shift MLP Architecture for Vision
S2^22-MLP: Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
39
186
0
14 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise
  Convolution
On the Connection between Local Attention and Dynamic Depth-wise Convolution
Qi Han
Zejia Fan
Qi Dai
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
16
105
0
08 Jun 2021
A Lightweight and Gradient-Stable Neural Layer
A Lightweight and Gradient-Stable Neural Layer
Yueyao Yu
Yin Zhang
21
0
0
08 Jun 2021
Vision Transformers with Hierarchical Attention
Vision Transformers with Hierarchical Attention
Yun-Hai Liu
Yu-Huan Wu
Guolei Sun
Le Zhang
Ajad Chhatkuli
Luc Van Gool
ViT
24
32
0
06 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or
  Strong Data Augmentations
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
11
320
0
03 Jun 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
19
127
0
28 May 2021
ResMLP: Feedforward networks for image classification with
  data-efficient training
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron
Piotr Bojanowski
Mathilde Caron
Matthieu Cord
Alaaeldin El-Nouby
...
Gautier Izacard
Armand Joulin
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
VLM
16
655
0
07 May 2021
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for
  Image Recognition
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition
Xiaohan Ding
Chunlong Xia
X. Zhang
Xiaojie Chu
Jungong Han
Guiguang Ding
15
92
0
05 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
250
2,603
0
04 May 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
6
93
0
26 Mar 2021
Can Vision Transformers Learn without Natural Images?
Can Vision Transformers Learn without Natural Images?
Kodai Nakashima
Hirokatsu Kataoka
Asato Matsumoto
K. Iwata
Nakamasa Inoue
ViT
17
34
0
24 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
269
3,622
0
24 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
267
179
0
17 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
272
979
0
27 Jan 2021
Red Alarm for Pre-trained Models: Universal Vulnerability to
  Neuron-Level Backdoor Attacks
Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks
Zhengyan Zhang
Guangxuan Xiao
Yongwei Li
Tian Lv
Fanchao Qi
Zhiyuan Liu
Yasheng Wang
Xin Jiang
Maosong Sun
AAML
21
67
0
18 Jan 2021
Not all parameters are born equal: Attention is mostly what you need
Not all parameters are born equal: Attention is mostly what you need
Nikolay Bogoychev
MoE
22
7
0
22 Oct 2020
Previous
1234567
Next