Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.04745
Cited By
On Layer Normalization in the Transformer Architecture
12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Layer Normalization in the Transformer Architecture"
41 / 141 papers shown
Title
Error Correction Code Transformer
Yoni Choukroun
Lior Wolf
13
47
0
27 Mar 2022
SolidGen: An Autoregressive Model for Direct B-rep Synthesis
P. Jayaraman
Joseph G. Lambourne
Nishkrit Desai
Karl D. D. Willis
Aditya Sanghi
Nigel Morris
13
49
0
26 Mar 2022
CT-SAT: Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Zhaoyi Liu
Bo Kang
Yun Wang
Dick Botteldooren
ViT
16
5
0
22 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
27
103
0
21 Mar 2022
EventFormer: AU Event Transformer for Facial Action Unit Event Detection
Yingjie Chen
Jiarui Zhang
Tao Wang
Yun Liang
ViT
24
0
0
12 Mar 2022
Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets
Yu Shi
Shuxin Zheng
Guolin Ke
Yifei Shen
Jiacheng You
Jiyan He
Shengjie Luo
Chang-Shu Liu
Di He
Tie-Yan Liu
AI4CE
33
65
0
09 Mar 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Qiaole Dong
Chenjie Cao
Yanwei Fu
CLL
11
137
0
02 Mar 2022
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
20
12
0
01 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
35
65
0
15 Feb 2022
Are Transformers More Robust? Towards Exact Robustness Verification for Transformers
B. Liao
Chih-Hong Cheng
Hasan Esen
Alois C. Knoll
AAML
16
1
0
08 Feb 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
102
0
16 Jan 2022
SPTS: Single-Point Text Spotting
Dezhi Peng
Xinyu Wang
Yuliang Liu
Jiaxin Zhang
Mingxin Huang
...
Jing Li
Dahua Lin
Chunhua Shen
Xiang Bai
Lianwen Jin
ViT
16
63
0
15 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
38
686
0
08 Dec 2021
UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks
Yichen Zhu
Weibin Meng
Ying Liu
Shenglin Zhang
Tao Han
Shimin Tao
Dan Pei
MoE
33
14
0
06 Dec 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
41
1,738
0
18 Nov 2021
NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian
Oleksii Hrinchuk
Virginia Adams
Oleksii Kuchaiev
VLM
14
16
0
16 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
22
14
0
01 Nov 2021
Geometric Transformer for End-to-End Molecule Properties Prediction
Yoni Choukroun
Lior Wolf
AI4CE
ViT
17
16
0
26 Oct 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
26
74
0
18 Oct 2021
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
22
59
0
14 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
110
9
0
08 Oct 2021
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
Edoardo Cetin
Oya Celiktutan
OffRL
32
16
0
07 Oct 2021
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun
Mingxuan Wang
Lei Li
AI4CE
183
22
0
11 Sep 2021
Global Self-Attention as a Replacement for Graph Convolution
Md Shamim Hussain
Mohammed J. Zaki
D. Subramanian
ViT
22
122
0
07 Aug 2021
WeChat Neural Machine Translation Systems for WMT21
Xianfeng Zeng
Yanjun Liu
Ernan Li
Qiu Ran
Fandong Meng
Peng Li
Jinan Xu
Jie Zhou
17
20
0
05 Aug 2021
Learning Attributed Graph Representations with Communicative Message Passing Transformer
Jianwen Chen
Shuangjia Zheng
Ying Song
Jiahua Rao
Yuedong Yang
17
46
0
19 Jul 2021
Improved Language Identification Through Cross-Lingual Self-Supervised Learning
Andros Tjandra
Diptanu Gon Choudhury
Frank Zhang
Kritika Singh
Alexis Conneau
Alexei Baevski
Assaf Sela
Yatharth Saraf
Michael Auli
VLM
SSL
21
35
0
08 Jul 2021
Do Transformers Really Perform Bad for Graph Representation?
Chengxuan Ying
Tianle Cai
Shengjie Luo
Shuxin Zheng
Guolin Ke
Di He
Yanming Shen
Tie-Yan Liu
GNN
23
432
0
09 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
27
1,084
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
29
219
0
31 May 2021
How could Neural Networks understand Programs?
Dinglan Peng
Shuxin Zheng
Yatao Li
Guolin Ke
Di He
Tie-Yan Liu
NAI
13
61
0
10 May 2021
PanGu-
α
α
α
: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
27
212
0
26 Apr 2021
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
ODL
23
53
0
16 Feb 2021
An Efficient Transformer Decoder with Compressed Sub-layers
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
17
29
0
03 Jan 2021
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
18
73
0
21 Oct 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
28
3,904
0
10 Apr 2020
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
36
224
0
14 Oct 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
216
1,398
0
04 Dec 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
220
348
0
14 Jun 2018
OpenNMT: Neural Machine Translation Toolkit
Guillaume Klein
Yoon Kim
Yuntian Deng
Vincent Nguyen
Jean Senellart
Alexander M. Rush
144
119
0
28 May 2018
Previous
1
2
3