SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

Chao Wang
Bei Li
Jiaqi Zhang
Xinyu Liu
Yuchun Fan
Linkun Lyu
Xin Chen
Jingang Wang
Tong Xiao
Peng Pei
Xunliang Cai
    MoE

Papers citing "SpanNorm: Reconciling Training Stability and Performance in Deep Transformers"

0 / 0 papers shown

No papers found