Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

23 May 2023

Li-Wei Chen

Alexander I. Rudnicky

Peter J. Ramadge

Papers citing "Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings"

4 / 4 papers shown

Title
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension Yijiong Yu Huiqiang Jiang Xufang Luo Qianhui Wu Chin-Yew Lin Dongsheng Li Yuqing Yang Yongfeng Huang L. Qiu 35 9 0 04 Jun 2024
Breaking Symmetry When Training Transformers Chunsheng Zuo Michael Guerzhoy 17 0 0 06 Feb 2024
What Language Model to Train if You Have One Million GPU Hours? Teven Le Scao Thomas Wang Daniel Hesslow Lucile Saulnier Stas Bekman ... Lintang Sutawika Jaesung Tae Zheng-Xin Yong Julien Launay Iz Beltagy MoE AI4CE 212 103 0 27 Oct 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 236 1,508 0 31 Dec 2020