Monotonic Location Attention for Length Generalization

Monotonic Location Attention for Length Generalization

31 May 2023

Jishnu Ray Chowdhury

Cornelia Caragea

Papers citing "Monotonic Location Attention for Length Generalization"

7 / 7 papers shown

Title
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 41 0 0 29 Mar 2025
Training-Free Long-Context Scaling of Large Language Models Chen An Fei Huang Jun Zhang Shansan Gong Xipeng Qiu Chang Zhou Lingpeng Kong ALM LRM 32 34 0 27 Feb 2024
Your Transformer May Not be as Powerful as You Expect Shengjie Luo Shanda Li Shuxin Zheng Tie-Yan Liu Liwei Wang Di He 52 50 0 26 May 2022
LISA: Learning Interpretable Skill Abstractions from Language Divyansh Garg Skanda Vaidyanath Kuno Kim Jiaming Song Stefano Ermon LM&Ro OffRL 150 29 0 28 Feb 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 245 695 0 27 Aug 2021
PonderNet: Learning to Ponder Andrea Banino Jan Balaguer Charles Blundell PINN AIMat 94 80 0 12 Jul 2021
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 216 7,924 0 17 Aug 2015