ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.08415
  4. Cited By
Gaussian Error Linear Units (GELUs)

Gaussian Error Linear Units (GELUs)

27 June 2016
Dan Hendrycks
Kevin Gimpel
ArXivPDFHTML

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 780 papers shown
Title
AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival
  Analysis with Whole Slide Images and Gene Expression Data
AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data
Ruoqi Wang
Ziwang Huang
Haitao Wang
Hejun Wu
10
6
0
28 Aug 2021
LocTex: Learning Data-Efficient Visual Representations from Localized
  Textual Supervision
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Zhijian Liu
Simon Stent
Jie Li
John Gideon
Song Han
VLM
25
10
0
26 Aug 2021
TransFER: Learning Relation-aware Facial Expression Representations with
  Transformers
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
Fanglei Xue
Qiangchang Wang
G. Guo
ViT
39
183
0
25 Aug 2021
Deep neural networks approach to microbial colony detection -- a
  comparative analysis
Deep neural networks approach to microbial colony detection -- a comparative analysis
Sylwia Majchrowska
J. Pawlowski
Natalia Czerep
Aleksander Górecki
Jakub Kuciñski
Tomasz Golan
13
5
0
23 Aug 2021
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in
  Sequential Recommendation
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation
Hojoon Lee
Dongyoon Hwang
Sunghwan Hong
Changyeon Kim
Seungryong Kim
Jaegul Choo
27
10
0
17 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
  Locality?
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
24
12
0
09 Aug 2021
Congested Crowd Instance Localization with Dilated Convolutional Swin
  Transformer
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
Junyuan Gao
Maoguo Gong
Xuelong Li
ViT
19
46
0
02 Aug 2021
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
Yu Fu
Tianyang Xu
Xiaojun Wu
J. Kittler
ViT
21
37
0
29 Jul 2021
Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign
  Language Recognition
Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign Language Recognition
Pan Xie
Zhi Cui
Yao Du
Mengyi Zhao
Jianwei Cui
Bin Wang
Xiaohui Hu
SLR
23
32
0
27 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
19
231
0
21 Jul 2021
Directly Training Joint Energy-Based Models for Conditional Synthesis
  and Calibrated Prediction of Multi-Attribute Data
Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data
Jacob Kelly
R. Zemel
Will Grathwohl
36
2
0
19 Jul 2021
Simultaneous Speech Translation for Live Subtitling: from Delay to
  Display
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
20
10
0
19 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip H. S. Torr
50
27
0
13 Jul 2021
Activated Gradients for Deep Neural Networks
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
19
135
0
09 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Simple Training Strategies and Model Scaling for Object Detection
Simple Training Strategies and Model Scaling for Object Detection
Xianzhi Du
Barret Zoph
Wei-Chih Hung
Tsung-Yi Lin
ObjD
31
40
0
30 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
45
26
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
29
1,607
0
25 Jun 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Rameswar Panda
Yifan Jiang
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
31
6
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
29
219
0
22 Jun 2021
OadTR: Online Action Detection with Transformers
OadTR: Online Action Detection with Transformers
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Yuanjie Shao
Zhe Zuo
Changxin Gao
Nong Sang
OffRL
ViT
34
109
0
21 Jun 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer
  Training
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
21
32
0
17 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
12
575
0
10 Jun 2021
Programming Puzzles
Programming Puzzles
Tal Schuster
A. Kalyan
Oleksandr Polozov
Adam Tauman Kalai
ELM
15
32
0
10 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
22
10
0
10 Jun 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for
  Equitable Evaluation
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Swaroop Mishra
Anjana Arunkumar
26
24
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
44
467
0
08 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Reveal of Vision Transformers Robustness against Adversarial Attacks
Reveal of Vision Transformers Robustness against Adversarial Attacks
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
ViT
15
56
0
07 Jun 2021
Self-supervised Depth Estimation Leveraging Global Perception and
  Geometric Smoothness Using On-board Videos
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos
Shaocheng Jia
Xin Pei
W. Yao
S. Wong
3DPC
MDE
38
19
0
07 Jun 2021
Empowering Language Understanding with Counterfactual Reasoning
Empowering Language Understanding with Counterfactual Reasoning
Fuli Feng
Jizhi Zhang
Xiangnan He
Hanwang Zhang
Tat-Seng Chua
LRM
21
33
0
06 Jun 2021
Learning Dynamic Graph Representation of Brain Connectome with
  Spatio-Temporal Attention
Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention
Byung-Hoon Kim
Jong Chul Ye
Jae-Jin Kim
32
128
0
27 May 2021
Efficient and Accurate Gradients for Neural SDEs
Efficient and Accurate Gradients for Neural SDEs
Patrick Kidger
James Foster
Xuechen Li
Terry Lyons
DiffM
24
60
0
27 May 2021
Fixed-Dimensional and Permutation Invariant State Representation of
  Autonomous Driving
Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Jingliang Duan
Dongjie Yu
Shengbo Eben Li
Wenxuan Wang
Yangang Ren
Ziyu Lin
B. Cheng
22
10
0
24 May 2021
One4all User Representation for Recommender Systems in E-commerce
One4all User Representation for Recommender Systems in E-commerce
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Minkyu Kim
Young-Jin Park
Jisu Jeong
Seungjae Jung
25
27
0
24 May 2021
Vision Transformer for Fast and Efficient Scene Text Recognition
Vision Transformer for Fast and Efficient Scene Text Recognition
Rowel Atienza
ViT
17
144
0
18 May 2021
Link Prediction on N-ary Relational Facts: A Graph-based Approach
Link Prediction on N-ary Relational Facts: A Graph-based Approach
Quan Wang
Haifeng Wang
Yajuan Lyu
Yong Zhu
24
44
0
18 May 2021
Sparta: Spatially Attentive and Adversarially Robust Activation
Sparta: Spatially Attentive and Adversarially Robust Activation
Qing-Wu Guo
Felix Juefei Xu
Changqing Zhou
Wei Feng
Yang Liu
Song Wang
AAML
22
4
0
18 May 2021
Pay Attention to MLPs
Pay Attention to MLPs
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
39
651
0
17 May 2021
Vision Transformers are Robust Learners
Vision Transformers are Robust Learners
Sayak Paul
Pin-Yu Chen
ViT
19
305
0
17 May 2021
Counterfactual Explanations for Neural Recommenders
Counterfactual Explanations for Neural Recommenders
Khanh Tran
Azin Ghazimatin
Rishiraj Saha Roy
AAML
CML
52
65
0
11 May 2021
ResMLP: Feedforward networks for image classification with
  data-efficient training
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron
Piotr Bojanowski
Mathilde Caron
Matthieu Cord
Alaaeldin El-Nouby
...
Gautier Izacard
Armand Joulin
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
VLM
21
655
0
07 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
271
2,603
0
04 May 2021
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and
  Nonlocal Effects
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and Nonlocal Effects
Oliver T. Unke
Stefan Chmiela
M. Gastegger
Kristof T. Schütt
H. E. Sauceda
K. Müller
171
246
0
01 May 2021
Reconstructing nodal pressures in water distribution systems with graph
  neural networks
Reconstructing nodal pressures in water distribution systems with graph neural networks
Gergely Hajgató
Bálint Gyires-Tóth
Gyorgy Paál
16
14
0
28 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
54
1,222
0
22 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and
  its application in fault diagnosis of rolling bearings
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
Previous
123...13141516
Next