ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 141 papers shown
Title
Cross-attention Spatio-temporal Context Transformer for Semantic
  Segmentation of Historical Maps
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
Sidi Wu
Yizi Chen
Konrad Schindler
L. Hurni
19
2
0
19 Oct 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TS
GNN
52
3
0
27 Sep 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
30
1
0
20 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
66
701
0
19 Sep 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
22
1
0
07 Jun 2023
Centered Self-Attention Layers
Centered Self-Attention Layers
Ameen Ali
Tomer Galanti
Lior Wolf
28
6
0
02 Jun 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
14
0
0
25 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurélien Lucchi
Thomas Hofmann
32
53
0
25 May 2023
Exploiting Fine-Grained DCT Representations for Hiding Image-Level
  Messages within JPEG Images
Exploiting Fine-Grained DCT Representations for Hiding Image-Level Messages within JPEG Images
Junxue Yang
Xin Liao
28
5
0
11 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine
  Translation
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
22
0
0
10 May 2023
What is the best recipe for character-level encoder-only modelling?
What is the best recipe for character-level encoder-only modelling?
Kris Cao
32
2
0
09 May 2023
Coherent Wave Dynamics and Language Generation of a Generative
  Pre-trained Transformer
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Tao Hong
14
0
0
08 May 2023
NoiseTrans: Point Cloud Denoising with Transformers
NoiseTrans: Point Cloud Denoising with Transformers
Guangzhe Hou
G. Qin
Minghui Sun
Yanhua Liang
Jie Yan
Zhonghan Zhang
3DPC
ViT
15
2
0
24 Apr 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
22
1
0
17 Apr 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks
  with Soft-Thresholding
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
28
0
0
14 Apr 2023
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
28
28
0
19 Feb 2023
V1T: large-scale mouse V1 response prediction using a Vision Transformer
V1T: large-scale mouse V1 response prediction using a Vision Transformer
Bryan M. Li
I. M. Cornacchia
Nathalie L Rochefort
A. Onken
19
8
0
06 Feb 2023
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
  Property Prediction
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
Christopher Fifty
Joseph M. Paggi
Ehsan Amid
J. Leskovec
R. Dror
AI4CE
11
0
0
04 Feb 2023
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
16
4
0
24 Dec 2022
SegAugment: Maximizing the Utility of Speech Translation Data with
  Segmentation-based Augmentations
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
31
6
0
19 Dec 2022
Latent Diffusion for Language Generation
Latent Diffusion for Language Generation
Justin Lovelace
Varsha Kishore
Chao-gang Wan
Eliot Shekhtman
Kilian Q. Weinberger
DiffM
19
71
0
19 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
120
36
0
15 Dec 2022
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual
  Machine Translation
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
41
8
0
15 Dec 2022
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data
Matthias Zeller
Jens Behley
Michael Heidingsfeld
C. Stachniss
24
23
0
07 Dec 2022
Uncertainty-aware Vision-based Metric Cross-view Geolocalization
Uncertainty-aware Vision-based Metric Cross-view Geolocalization
F. Fervers
Sebastian Bullinger
C. Bodensteiner
Michael Arens
Rainer Stiefelhagen
16
39
0
22 Nov 2022
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text
  Generation via Concentrating Attention
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Wenhao Li
Xiaoyuan Yi
Jinyi Hu
Maosong Sun
Xing Xie
21
0
0
14 Nov 2022
Target-Speaker Voice Activity Detection via Sequence-to-Sequence
  Prediction
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction
Ming Cheng
Weiqing Wang
Yucong Zhang
Xiaoyi Qin
Ming Li
VLM
48
32
0
28 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
15
0
0
22 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using
  Strips Window Attention
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Xiaogang Xu
Lei Wang
Zaiyan Dai
Jun Yang
ViT
22
23
0
22 Oct 2022
ZITS++: Image Inpainting by Improving the Incremental Transformer on
  Structural Priors
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors
Chenjie Cao
Qiaole Dong
Yanwei Fu
33
30
0
12 Oct 2022
Bridging the Gap to Real-World Object-Centric Learning
Bridging the Gap to Real-World Object-Centric Learning
Maximilian Seitzer
Max Horn
Andrii Zadaianchuk
Dominik Zietlow
Tianjun Xiao
...
Tong He
Zheng-Wei Zhang
Bernhard Schölkopf
Thomas Brox
Francesco Locatello
OCL
37
139
0
29 Sep 2022
Deep Sparse Conformer for Speech Recognition
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
14
2
0
01 Sep 2022
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language
  Model Erlangshen with Propensity-Corrected Loss
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss
Junjie Wang
Yuxiang Zhang
Ping Yang
Ruyi Gan
11
2
0
05 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
  Model
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gökhan Tür
Premkumar Natarajan
34
82
0
02 Aug 2022
On Mitigating Hard Clusters for Face Clustering
On Mitigating Hard Clusters for Face Clustering
Yingjie Chen
Huasong Zhong
Chong Chen
Chen Shen
Jianqiang Huang
Tao Wang
Yun Liang
Qianru Sun
CVBM
28
12
0
25 Jul 2022
Earthformer: Exploring Space-Time Transformers for Earth System
  Forecasting
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
Zhihan Gao
Xingjian Shi
Hao Wang
Yi Zhu
Yuyang Wang
Mu Li
Dit-Yan Yeung
AI4TS
31
145
0
12 Jul 2022
Pure Transformers are Powerful Graph Learners
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
32
187
0
06 Jul 2022
SAVi++: Towards End-to-End Object-Centric Learning from Real-World
  Videos
SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
Gamaleldin F. Elsayed
Aravindh Mahendran
Sjoerd van Steenkiste
Klaus Greff
Michael C. Mozer
Thomas Kipf
VOS
OCL
47
136
0
15 Jun 2022
A smile is all you need: Predicting limiting activity coefficients from
  SMILES with natural language processing
A smile is all you need: Predicting limiting activity coefficients from SMILES with natural language processing
Benedikt Winter
Clemens Winter
J. Schilling
A. Bardow
11
28
0
15 Jun 2022
K-Space Transformer for Undersampled MRI Reconstruction
K-Space Transformer for Undersampled MRI Reconstruction
Ziheng Zhao
Tianjiao Zhang
Weidi Xie
Yanfeng Wang
Ya-Qin Zhang
MedIm
19
5
0
14 Jun 2022
Object Scene Representation Transformer
Object Scene Representation Transformer
Mehdi S. M. Sajjadi
Daniel Duckworth
Aravindh Mahendran
Sjoerd van Steenkiste
Filip Pavetić
Mario Luvcić
Leonidas J. Guibas
Klaus Greff
Thomas Kipf
ViT
OCL
22
89
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
Unified Recurrence Modeling for Video Action Anticipation
Unified Recurrence Modeling for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
O. Lanz
19
8
0
02 Jun 2022
FairNorm: Fair and Fast Graph Neural Network Training
FairNorm: Fair and Fast Graph Neural Network Training
Öykü Deniz Köse
Yanning Shen
AI4CE
11
4
0
20 May 2022
Joint Forecasting of Panoptic Segmentations with Difference Attention
Joint Forecasting of Panoptic Segmentations with Difference Attention
Colin Graber
Cyril Jazra
Wenjie Luo
Liangyan Gui
A. Schwing
AI4TS
19
1
0
14 Apr 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
  Language Models with Model Generated Signals
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
42
32
0
13 Apr 2022
TANet: Thread-Aware Pretraining for Abstractive Conversational
  Summarization
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization
Ze Yang
Liran Wang
Zhoujin Tian
Wei Yu Wu
Zhoujun Li
14
4
0
09 Apr 2022
Collaborative Transformers for Grounded Situation Recognition
Collaborative Transformers for Grounded Situation Recognition
Junhyeong Cho
Youngseok Yoon
Suha Kwak
ViT
17
25
0
30 Mar 2022
REGTR: End-to-end Point Cloud Correspondences with Transformers
REGTR: End-to-end Point Cloud Correspondences with Transformers
Zi Jian Yew
Gim Hee Lee
3DPC
ViT
24
169
0
28 Mar 2022
Previous
123
Next