Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.12409
Cited By
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
27 August 2021
Ofir Press
Noah A. Smith
M. Lewis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation"
11 / 11 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
2
0
0
02 May 2025
MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning
Murtadha Ahmed
Wenbo
Liu yunfeng
2
0
0
02 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
23
0
0
01 May 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
12
34
0
28 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
14
0
0
11 Apr 2025
Conformal Transformations for Symmetric Power Transformers
Saurabh Kumar
Jacob Buckman
Carles Gelada
Sean Zhang
45
0
0
05 Mar 2025
Investigating Length Issues in Document-level Machine Translation
Ziqian Peng
Rachel Bawden
François Yvon
29
1
0
23 Dec 2024
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
197
36
0
13 Sep 2021
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
190
75
0
31 Dec 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
201
502
0
12 Mar 2020
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
177
1,330
0
06 Jun 2016
1