ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.08415
  4. Cited By
Gaussian Error Linear Units (GELUs)

Gaussian Error Linear Units (GELUs)

27 June 2016
Dan Hendrycks
Kevin Gimpel
ArXivPDFHTML

Papers citing "Gaussian Error Linear Units (GELUs)"

30 / 780 papers shown
Title
Unsupervised Cross-lingual Representation Learning for Speech
  Recognition
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
41
754
0
24 Jun 2020
Categorical Normalizing Flows via Continuous Transformations
Categorical Normalizing Flows via Continuous Transformations
Phillip Lippe
E. Gavves
BDL
15
43
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
24
432
0
11 Jun 2020
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
  Injection into Pretrained Transformers
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher
Olga Majewska
Leonardo F. R. Ribeiro
Iryna Gurevych
Nikolai Rozanov
Goran Glavavs
KELM
31
79
0
24 May 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained
  Conversational Representations
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations
Sam Coope
Tyler Farghly
D. Gerz
Ivan Vulić
Matthew Henderson
19
62
0
18 May 2020
UDapter: Language Adaptation for Truly Universal Dependency Parsing
UDapter: Language Adaptation for Truly Universal Dependency Parsing
A. Ustun
Arianna Bisazza
G. Bouma
Gertjan van Noord
27
113
0
29 Apr 2020
Random Features for Kernel Approximation: A Survey on Algorithms,
  Theory, and Beyond
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
Fanghui Liu
Xiaolin Huang
Yudong Chen
Johan A. K. Suykens
BDL
36
172
0
23 Apr 2020
Evolving Normalization-Activation Layers
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
12
79
0
06 Apr 2020
Keyphrase Extraction with Span-based Feature Representations
Keyphrase Extraction with Span-based Feature Representations
Funan Mu
Zhenting Yu
Lifeng Wang
Yequan Wang
Qingyu Yin
Yibo Sun
Liqun Liu
Teng Ma
Jing Tang
Xing Zhou
24
17
0
13 Feb 2020
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for
  Addressing Value Estimation Errors
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Jingliang Duan
Yang Guan
Shengbo Eben Li
Yangang Ren
B. Cheng
OffRL
15
173
0
09 Jan 2020
Machine Learning from a Continuous Viewpoint
Machine Learning from a Continuous Viewpoint
E. Weinan
Chao Ma
Lei Wu
23
102
0
30 Dec 2019
TreeGen: A Tree-Based Transformer Architecture for Code Generation
TreeGen: A Tree-Based Transformer Architecture for Code Generation
Zeyu Sun
Qihao Zhu
Yingfei Xiong
Yican Sun
Lili Mou
Lu Zhang
17
173
0
22 Nov 2019
Symmetrical Gaussian Error Linear Units (SGELUs)
Symmetrical Gaussian Error Linear Units (SGELUs)
Chao Yu
Zhiguo Su
4
10
0
10 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
13
196
0
09 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
41
10,583
0
29 Oct 2019
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
15
173
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
14
248
0
22 Oct 2019
Multi-Task Learning for Conversational Question Answering over a
  Large-Scale Knowledge Base
Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base
Tao Shen
Xiubo Geng
Tao Qin
Daya Guo
Duyu Tang
Nan Duan
Guodong Long
Daxin Jiang
25
81
0
11 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
64
6,370
0
26 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
205
0
25 Sep 2019
Cross-Lingual Natural Language Generation via Pre-Training
Cross-Lingual Natural Language Generation via Pre-Training
Zewen Chi
Li Dong
Furu Wei
Wenhui Wang
Xian-Ling Mao
Heyan Huang
19
136
0
23 Sep 2019
Multi-Task Self-Supervised Learning for Disfluency Detection
Multi-Task Self-Supervised Learning for Disfluency Detection
Shaolei Wang
Wanxiang Che
Qi Liu
Pengda Qin
Ting Liu
William Yang Wang
SSL
14
56
0
15 Aug 2019
A Generalized Framework of Sequence Generation with Application to
  Undirected Sequence Models
A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
Elman Mansimov
Alex Jinpeng Wang
Sean Welleck
Kyunghyun Cho
AIMat
20
46
0
29 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
27
172
0
10 May 2019
Unified Language Model Pre-training for Natural Language Understanding
  and Generation
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu-Chiang Frank Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
77
1,550
0
08 May 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
11
1,847
0
23 Apr 2019
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Shijie Wu
Mark Dredze
VLM
SSeg
22
671
0
19 Apr 2019
Neural Empirical Bayes
Neural Empirical Bayes
Saeed Saremi
Aapo Hyvarinen
10
65
0
06 Mar 2019
NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation
NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation
Anastasis Kratsios
Cody B. Hyndman
OOD
22
17
0
31 Aug 2018
Previous
123...141516