Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09371
Cited By
Transformers Can Achieve Length Generalization But Not Robustly
14 February 2024
Yongchao Zhou
Uri Alon
Xinyun Chen
Xuezhi Wang
Rishabh Agarwal
Denny Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers Can Achieve Length Generalization But Not Robustly"
35 / 35 papers shown
Title
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
31
0
0
03 Apr 2025
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
32
0
0
24 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
37
0
0
24 Feb 2025
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Maya Bechler-Speicher
Ben Finkelshtein
Fabrizio Frasca
Luis Muller
Jan Tonshoff
...
Michael M. Bronstein
Mathias Niepert
Bryan Perozzi
Mikhail Galkin
Christopher Morris
OOD
97
2
0
21 Feb 2025
Solving Empirical Bayes via Transformers
Anzo Teh
Mark Jabbour
Yury Polyanskiy
88
0
0
17 Feb 2025
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Haoyang Li
Xuejia Chen
Zhanchao Xu
Darian Li
Nicole Hu
...
Y. Li
Luyu Qiu
C. Zhang
Qing Li
Lei Chen
LRM
ELM
34
1
0
16 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
70
4
0
03 Feb 2025
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari
Chun-Liang Li
Jen-Hao Rick Chang
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Oncel Tuzel
23
7
0
08 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Z. Wang
KELM
36
1
0
03 Jan 2025
Quantifying artificial intelligence through algebraic generalization
Takuya Ito
Murray Campbell
L. Horesh
Tim Klinger
Parikshit Ram
ELM
46
0
0
08 Nov 2024
Provable Length Generalization in Sequence Prediction via Spectral Filtering
Annie Marsden
Evan Dogariu
Naman Agarwal
Xinyi Chen
Daniel Suo
Elad Hazan
34
1
0
01 Nov 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Aaron C. Courville
Rameswar Panda
30
4
0
23 Oct 2024
Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines
Junyu Lai
Jiahe Xu
Yao Yang
Yunpeng Huang
Chun Cao
Jingwei Xu
LRM
24
2
0
10 Oct 2024
Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects
Wenhao Li
Yudong Xu
Scott Sanner
Elias Boutros Khalil
ViT
29
2
0
08 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
26
1
0
07 Oct 2024
GAMformer: In-Context Learning for Generalized Additive Models
Andreas Mueller
Julien N. Siems
Harsha Nori
David Salinas
Arber Zela
Rich Caruana
Frank Hutter
AI4CE
31
3
0
06 Oct 2024
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi
Hongyin Luo
Xuliang Huang
Zhuokai Zhao
Yibo Jiang
Xiangjun Fan
Himabindu Lakkaraju
James Glass
LRM
ELM
26
5
0
02 Oct 2024
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi
Sunny Panchal
Roland Memisevic
25
5
0
10 Aug 2024
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
35
7
0
03 Jul 2024
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong
Vasilis Papageorgiou
Kangwook Lee
Dimitris Papailiopoulos
SyDa
RALM
24
11
0
27 Jun 2024
The CLRS-Text Algorithmic Reasoning Language Benchmark
Larisa Markeeva
Sean McLeish
Borja Ibarz
Wilfried Bounsi
Olga Kozlova
Alex Vitvitskyi
Charles Blundell
Tom Goldstein
Avi Schwarzschild
Petar Veličković
LRM
34
12
0
06 Jun 2024
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
36
28
0
27 May 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
40
25
0
12 Apr 2024
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li
William H. Beluch
M. Keuper
Dan Zhang
Anna Khoreva
DiffM
VGen
71
5
0
20 Mar 2024
Premise Order Matters in Reasoning with Large Language Models
Xinyun Chen
Ryan A. Chi
Xuezhi Wang
Denny Zhou
ReLM
LRM
33
26
0
14 Feb 2024
Improving Black-box Robustness with In-Context Rewriting
Kyle O'Brien
Nathan Ng
Isha Puri
Jorge Mendez
Hamid Palangi
Yoon Kim
Marzyeh Ghassemi
Tom Hartvigsen
41
6
0
13 Feb 2024
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
94
129
0
05 Jul 2022
The CLRS Algorithmic Reasoning Benchmark
Petar Velivcković
Adria Puigdomenech Badia
David Budden
Razvan Pascanu
Andrea Banino
Mikhail Dashevskiy
R. Hadsell
Charles Blundell
157
86
0
31 May 2022
Autoformalization with Large Language Models
Yuhuai Wu
Albert Q. Jiang
Wenda Li
M. Rabe
Charles Staats
M. Jamnik
Christian Szegedy
AI4CE
108
156
0
25 May 2022
A Data-Centric Approach for Training Deep Neural Networks with Less Data
Mohammad Motamedi
Nikolay Sakharnykh
T. Kaldewey
54
65
0
07 Oct 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
83
149
0
17 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
242
690
0
27 Aug 2021
1