Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.08415
Cited By
Gaussian Error Linear Units (GELUs)
27 June 2016
Dan Hendrycks
Kevin Gimpel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gaussian Error Linear Units (GELUs)"
50 / 780 papers shown
Title
AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data
Ruoqi Wang
Ziwang Huang
Haitao Wang
Hejun Wu
10
6
0
28 Aug 2021
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Zhijian Liu
Simon Stent
Jie Li
John Gideon
Song Han
VLM
25
10
0
26 Aug 2021
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
Fanglei Xue
Qiangchang Wang
G. Guo
ViT
39
183
0
25 Aug 2021
Deep neural networks approach to microbial colony detection -- a comparative analysis
Sylwia Majchrowska
J. Pawlowski
Natalia Czerep
Aleksander Górecki
Jakub Kuciñski
Tomasz Golan
13
5
0
23 Aug 2021
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation
Hojoon Lee
Dongyoon Hwang
Sunghwan Hong
Changyeon Kim
Seungryong Kim
Jaegul Choo
27
10
0
17 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
24
12
0
09 Aug 2021
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
Junyuan Gao
Maoguo Gong
Xuelong Li
ViT
19
46
0
02 Aug 2021
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
Yu Fu
Tianyang Xu
Xiaojun Wu
J. Kittler
ViT
21
37
0
29 Jul 2021
Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign Language Recognition
Pan Xie
Zhi Cui
Yao Du
Mengyi Zhao
Jianwei Cui
Bin Wang
Xiaohui Hu
SLR
23
32
0
27 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
19
231
0
21 Jul 2021
Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data
Jacob Kelly
R. Zemel
Will Grathwohl
36
2
0
19 Jul 2021
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
20
10
0
19 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip H. S. Torr
50
27
0
13 Jul 2021
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
19
135
0
09 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Simple Training Strategies and Model Scaling for Object Detection
Xianzhi Du
Barret Zoph
Wei-Chih Hung
Tsung-Yi Lin
ObjD
31
40
0
30 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
45
26
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
29
1,607
0
25 Jun 2021
IA-RED
2
^2
2
: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Rameswar Panda
Yifan Jiang
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
31
6
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
29
219
0
22 Jun 2021
OadTR: Online Action Detection with Transformers
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Yuanjie Shao
Zhe Zuo
Changxin Gao
Nong Sang
OffRL
ViT
34
109
0
21 Jun 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
21
32
0
17 Jun 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
12
575
0
10 Jun 2021
Programming Puzzles
Tal Schuster
A. Kalyan
Oleksandr Polozov
Adam Tauman Kalai
ELM
15
32
0
10 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
22
10
0
10 Jun 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Swaroop Mishra
Anjana Arunkumar
26
24
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
44
467
0
08 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Reveal of Vision Transformers Robustness against Adversarial Attacks
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
ViT
15
56
0
07 Jun 2021
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos
Shaocheng Jia
Xin Pei
W. Yao
S. Wong
3DPC
MDE
38
19
0
07 Jun 2021
Empowering Language Understanding with Counterfactual Reasoning
Fuli Feng
Jizhi Zhang
Xiangnan He
Hanwang Zhang
Tat-Seng Chua
LRM
21
33
0
06 Jun 2021
Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention
Byung-Hoon Kim
Jong Chul Ye
Jae-Jin Kim
32
128
0
27 May 2021
Efficient and Accurate Gradients for Neural SDEs
Patrick Kidger
James Foster
Xuechen Li
Terry Lyons
DiffM
24
60
0
27 May 2021
Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Jingliang Duan
Dongjie Yu
Shengbo Eben Li
Wenxuan Wang
Yangang Ren
Ziyu Lin
B. Cheng
22
10
0
24 May 2021
One4all User Representation for Recommender Systems in E-commerce
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Minkyu Kim
Young-Jin Park
Jisu Jeong
Seungjae Jung
25
27
0
24 May 2021
Vision Transformer for Fast and Efficient Scene Text Recognition
Rowel Atienza
ViT
17
144
0
18 May 2021
Link Prediction on N-ary Relational Facts: A Graph-based Approach
Quan Wang
Haifeng Wang
Yajuan Lyu
Yong Zhu
24
44
0
18 May 2021
Sparta: Spatially Attentive and Adversarially Robust Activation
Qing-Wu Guo
Felix Juefei Xu
Changqing Zhou
Wei Feng
Yang Liu
Song Wang
AAML
22
4
0
18 May 2021
Pay Attention to MLPs
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
39
651
0
17 May 2021
Vision Transformers are Robust Learners
Sayak Paul
Pin-Yu Chen
ViT
19
305
0
17 May 2021
Counterfactual Explanations for Neural Recommenders
Khanh Tran
Azin Ghazimatin
Rishiraj Saha Roy
AAML
CML
52
65
0
11 May 2021
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron
Piotr Bojanowski
Mathilde Caron
Matthieu Cord
Alaaeldin El-Nouby
...
Gautier Izacard
Armand Joulin
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
VLM
21
655
0
07 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
271
2,603
0
04 May 2021
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and Nonlocal Effects
Oliver T. Unke
Stefan Chmiela
M. Gastegger
Kristof T. Schütt
H. E. Sauceda
K. Müller
171
246
0
01 May 2021
Reconstructing nodal pressures in water distribution systems with graph neural networks
Gergely Hajgató
Bálint Gyires-Tóth
Gyorgy Paál
16
14
0
28 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
54
1,222
0
22 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
Previous
1
2
3
...
13
14
15
16
Next