On Layer Normalization in the Transformer Architecture

12 February 2020

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 136 papers shown

Title
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation Chenxi Liu Hao Miao Qianxiong Xu Shaowen Zhou Cheng Long Yan Zhao Ziyue Li Rui Zhao AI4TS 35 1 0 04 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piekos Róbert Csordás Jürgen Schmidhuber MoE VLM 94 1 0 01 May 2025
GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers Xinyu Li Qi Yao Y. Wang DiffM 41 0 0 30 Apr 2025
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data Manuel Weber Carly Beneke ViT 61 0 0 26 Apr 2025
IgCraft: A versatile sequence generation framework for antibody discovery and engineering Matthew Greenig Haowen Zhao Vladimir Radenkovic Aubin Ramon Pietro Sormanni 44 0 0 25 Mar 2025
Object-Centric World Model for Language-Guided Manipulation Youngjoon Jeong Junha Chun S. Cha Taesup Kim OCL VGen 114 1 0 08 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Zhijian Zhuo Yutao Zeng Ya Wang Sijun Zhang Jian Yang Xiaoqing Li Xun Zhou Jinwen Ma 46 0 0 06 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers William Merrill Ashish Sabharwal 53 4 0 05 Mar 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning Hojoon Lee Youngdo Lee Takuma Seno Donghu Kim Peter Stone Jaegul Choo 63 1 0 24 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Tianjin Huang Haotian Hu Zhenyu (Allen) Zhang Gaojie Jin X. Li ... Tianlong Chen Lu Liu Qingsong Wen Zhangyang Wang Shiwei Liu MQ 35 0 0 24 Feb 2025
Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach Andreas Loizou Dimitrios Tsoumakos 36 0 0 24 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace A. Laio Sebastian Goldt 71 8 0 17 Feb 2025
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model Jiarui Jin Haoyu Wang Hongyan Li Jun Yu Li Jiahui Pan Shenda Hong 39 5 0 15 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization Kelvin Kan Xingjian Li Stanley Osher 91 2 0 30 Jan 2025
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations Yue Yao Shengchao Yan Daniel Goehring Wolfram Burgard Joerg Reichardt OODD 40 2 0 28 Jan 2025
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions Qianying Cao Shanqing Liu Alan John Varghese Jérome Darbon M. Triantafyllou George Karniadakis AI4TS 122 0 0 21 Jan 2025
MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition Yanjie Cui Xiaohong Liu Jing Liang Yamin Fu 57 1 0 17 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Tianjin Huang Ziquan Zhu Gaojie Jin Lu Liu Zhangyang Wang Shiwei Liu 42 1 0 12 Jan 2025
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation Zhi Qu Yiran Wang Jiannan Mao Chenchen Ding Hideki Tanaka Masao Utiyama Taro Watanabe LRM 40 0 0 06 Jan 2025
Generative Pretrained Embedding and Hierarchical Irregular Time Series Representation for Daily Living Activity Recognition Damien Bouchabou S. Nguyen AI4TS 32 0 0 27 Dec 2024
VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction Khai Phan Tran Wen Hua Xue Li SyDa 85 0 0 18 Dec 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models Yuxian Gu Hao Zhou Fandong Meng Jie Zhou Minlie Huang 65 5 0 22 Oct 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse Federico Arangath Joseph Jerome Sieber M. Zeilinger Carmen Amo Alonso 33 0 0 14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Weronika Ormaniec Felix Dangel Sidak Pal Singh 33 6 0 14 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models N. Jha Brandon Reagen OffRL AI4CE 28 0 0 12 Oct 2024
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks N. Kim Seongsu Kim Minsu Kim Jinkyoo Park Sungsoo Ahn AI4CE 33 0 0 07 Oct 2024
Error Correction Code Transformer: From Non-Unified to Unified Yongli Yan Jieao Zhu Tianyue Zheng Jiaqi He Linglong Dai 21 1 0 04 Oct 2024
Selective Attention Improves Transformer Yaniv Leviathan Matan Kalman Yossi Matias 49 8 0 03 Oct 2024
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Zhaoxi Chen Jiaxiang Tang Yuhao Dong Ziang Cao Fangzhou Hong ... Tong Wu Shunsuke Saito Liang Pan Dahua Lin Ziwei Liu 49 16 0 19 Sep 2024
Diffusion Guided Language Modeling Justin Lovelace Varsha Kishore Yiwei Chen Kilian Q. Weinberger 36 6 0 08 Aug 2024
Sequential Contrastive Audio-Visual Learning Ioannis Tsiamas Santiago Pascual Chunghsin Yeh Joan Serra 33 2 0 08 Jul 2024
E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness Robin Courant Nicolas Dufour Xi Wang Marc Christie Vicky Kalogeiton VGen 36 4 0 01 Jul 2024
Coding for Intelligence from the Perspective of Category Wenhan Yang Zixuan Hu Lilang Lin Jiaying Liu Ling-Yu Duan AI4CE 33 1 0 01 Jul 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning Tianlang Chen Shengjie Luo Di He Shuxin Zheng Tie-Yan Liu Liwei Wang AI4CE 36 5 0 24 Jun 2024
DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting Yuxuan Shu Vasileios Lampos AI4TS AI4CE 58 0 0 11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Liliang Ren Yang Liu Yadong Lu Yelong Shen Chen Liang Weizhu Chen Mamba 64 55 0 11 Jun 2024
A Survey of Transformer Enabled Time Series Synthesis Alexander Sommers Logan Cummins Sudip Mittal Shahram Rahimi Maria Seale Joseph Jaboure Thomas Arnold AI4TS 33 2 0 04 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training Bobby He Lorenzo Noci Daniele Paliotta Imanol Schlag Thomas Hofmann 34 3 0 29 May 2024
Are queries and keys always relevant? A case study on Transformer wave functions Riccardo Rende Luciano Loris Viteritti 24 5 0 29 May 2024
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control Michal Nauman M. Ostaszewski Krzysztof Jankowski Piotr Milo's Marek Cygan OffRL 37 16 0 25 May 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com Sergei Krutikov Bulat Khaertdinov Rodion Kiriukhin Shubham Agrawal Kees Jan de Vries LMTD 30 0 0 22 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Xueyan Niu Bo Bai Lei Deng Wei Han 31 6 0 14 May 2024
Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification Gousia Habib Ishfaq Ahmed Malik Jameel Ahmad Imtiaz Ahmed Shaima Qureshi 29 0 0 01 Apr 2024
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning Kuan-Hsun Ho J. Hung Berlin Chen 34 0 0 04 Mar 2024
Transformers are Expressive, But Are They Expressive Enough for Regression? Swaroop Nath H. Khadilkar Pushpak Bhattacharyya 26 3 0 23 Feb 2024
DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction Qilong Ma Haixu Wu Lanxiang Xing Jianmin Wang Mingsheng Long AI4CE 19 0 0 04 Feb 2024
Accelerating Material Property Prediction using Generically Complete Isometry Invariants Jonathan Balasingham Viktor Zamaraev V. Kurlin 14 5 0 22 Jan 2024
Setting the Record Straight on Transformer Oversmoothing G. Dovonon M. Bronstein Matt J. Kusner 20 5 0 09 Jan 2024
TorchDEQ: A Library for Deep Equilibrium Models Zhengyang Geng J. Zico Kolter VLM 44 12 0 28 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps Sidi Wu Yizi Chen Konrad Schindler L. Hurni 19 2 0 19 Oct 2023