Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.04745
Cited By
On Layer Normalization in the Transformer Architecture
12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Layer Normalization in the Transformer Architecture"
50 / 136 papers shown
Title
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation
Chenxi Liu
Hao Miao
Qianxiong Xu
Shaowen Zhou
Cheng Long
Yan Zhao
Ziyue Li
Rui Zhao
AI4TS
35
1
0
04 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
94
1
0
01 May 2025
GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers
Xinyu Li
Qi Yao
Y. Wang
DiffM
41
0
0
30 Apr 2025
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
Manuel Weber
Carly Beneke
ViT
61
0
0
26 Apr 2025
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
Matthew Greenig
Haowen Zhao
Vladimir Radenkovic
Aubin Ramon
Pietro Sormanni
44
0
0
25 Mar 2025
Object-Centric World Model for Language-Guided Manipulation
Youngjoon Jeong
Junha Chun
S. Cha
Taesup Kim
OCL
VGen
114
1
0
08 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
46
0
0
06 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill
Ashish Sabharwal
53
4
0
05 Mar 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
63
1
0
24 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
35
0
0
24 Feb 2025
Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach
Andreas Loizou
Dimitrios Tsoumakos
36
0
0
24 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
A. Laio
Sebastian Goldt
71
8
0
17 Feb 2025
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jiarui Jin
Haoyu Wang
Hongyan Li
Jun Yu Li
Jiahui Pan
Shenda Hong
39
5
0
15 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
91
2
0
30 Jan 2025
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations
Yue Yao
Shengchao Yan
Daniel Goehring
Wolfram Burgard
Joerg Reichardt
OODD
40
2
0
28 Jan 2025
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions
Qianying Cao
Shanqing Liu
Alan John Varghese
Jérome Darbon
M. Triantafyllou
George Karniadakis
AI4TS
122
0
0
21 Jan 2025
MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition
Yanjie Cui
Xiaohong Liu
Jing Liang
Yamin Fu
57
1
0
17 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
42
1
0
12 Jan 2025
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation
Zhi Qu
Yiran Wang
Jiannan Mao
Chenchen Ding
Hideki Tanaka
Masao Utiyama
Taro Watanabe
LRM
40
0
0
06 Jan 2025
Generative Pretrained Embedding and Hierarchical Irregular Time Series Representation for Daily Living Activity Recognition
Damien Bouchabou
S. Nguyen
AI4TS
32
0
0
27 Dec 2024
VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction
Khai Phan Tran
Wen Hua
Xue Li
SyDa
85
0
0
18 Dec 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
65
5
0
22 Oct 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
M. Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
33
6
0
14 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
28
0
0
12 Oct 2024
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
N. Kim
Seongsu Kim
Minsu Kim
Jinkyoo Park
Sungsoo Ahn
AI4CE
33
0
0
07 Oct 2024
Error Correction Code Transformer: From Non-Unified to Unified
Yongli Yan
Jieao Zhu
Tianyue Zheng
Jiaqi He
Linglong Dai
21
1
0
04 Oct 2024
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
49
8
0
03 Oct 2024
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Zhaoxi Chen
Jiaxiang Tang
Yuhao Dong
Ziang Cao
Fangzhou Hong
...
Tong Wu
Shunsuke Saito
Liang Pan
Dahua Lin
Ziwei Liu
49
16
0
19 Sep 2024
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
36
6
0
08 Aug 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serra
33
2
0
08 Jul 2024
E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness
Robin Courant
Nicolas Dufour
Xi Wang
Marc Christie
Vicky Kalogeiton
VGen
36
4
0
01 Jul 2024
Coding for Intelligence from the Perspective of Category
Wenhan Yang
Zixuan Hu
Lilang Lin
Jiaying Liu
Ling-Yu Duan
AI4CE
33
1
0
01 Jul 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
36
5
0
24 Jun 2024
DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting
Yuxuan Shu
Vasileios Lampos
AI4TS
AI4CE
58
0
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
64
55
0
11 Jun 2024
A Survey of Transformer Enabled Time Series Synthesis
Alexander Sommers
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
Thomas Arnold
AI4TS
33
2
0
04 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
Are queries and keys always relevant? A case study on Transformer wave functions
Riccardo Rende
Luciano Loris Viteritti
24
5
0
29 May 2024
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
Michal Nauman
M. Ostaszewski
Krzysztof Jankowski
Piotr Milo's
Marek Cygan
OffRL
37
16
0
25 May 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Sergei Krutikov
Bulat Khaertdinov
Rodion Kiriukhin
Shubham Agrawal
Kees Jan de Vries
LMTD
30
0
0
22 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
31
6
0
14 May 2024
Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification
Gousia Habib
Ishfaq Ahmed Malik
Jameel Ahmad
Imtiaz Ahmed
Shaima Qureshi
29
0
0
01 Apr 2024
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
Kuan-Hsun Ho
J. Hung
Berlin Chen
34
0
0
04 Mar 2024
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
26
3
0
23 Feb 2024
DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction
Qilong Ma
Haixu Wu
Lanxiang Xing
Jianmin Wang
Mingsheng Long
AI4CE
19
0
0
04 Feb 2024
Accelerating Material Property Prediction using Generically Complete Isometry Invariants
Jonathan Balasingham
Viktor Zamaraev
V. Kurlin
14
5
0
22 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
44
12
0
28 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
Sidi Wu
Yizi Chen
Konrad Schindler
L. Hurni
19
2
0
19 Oct 2023
1
2
3
Next