Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.16028
Cited By
What Algorithms can Transformers Learn? A Study in Length Generalization
24 October 2023
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Algorithms can Transformers Learn? A Study in Length Generalization"
50 / 93 papers shown
Title
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
33
0
0
21 Apr 2025
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
31
0
0
03 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
39
0
0
29 Mar 2025
Graph neural networks extrapolate out-of-distribution for shortest paths
Robert Nerem
Samantha Chen
Sanjoy Dasgupta
Yusu Wang
42
0
0
24 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
43
0
0
14 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Damien Teney
Liangze Jiang
Florin Gogianu
Ehsan Abbasnejad
88
0
0
13 Mar 2025
Context-aware Biases for Length Extrapolation
Ali Veisi
Amir Mansourian
50
0
0
11 Mar 2025
Synthetic Tabular Data Detection In the Wild
G. C. N. Kindji
Elisa Fromont
L. Rojas-Barahona
Tanguy Urvoy
LMTD
58
1
0
03 Mar 2025
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Gokul Swamy
Sanjiban Choudhury
Wen Sun
Zhiwei Steven Wu
J. Andrew Bagnell
OffRL
42
7
0
03 Mar 2025
The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs
Tanja Baeumel
Josef van Genabith
Simon Ostermann
LRM
59
1
0
27 Feb 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Yifan Zhang
Wenyu Du
Dongming Jin
Jie Fu
Zhi Jin
LRM
46
0
0
27 Feb 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
62
1
0
26 Feb 2025
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
27
0
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
67
0
0
24 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
37
0
0
24 Feb 2025
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Maya Bechler-Speicher
Ben Finkelshtein
Fabrizio Frasca
Luis Muller
Jan Tonshoff
...
Michael M. Bronstein
Mathias Niepert
Bryan Perozzi
Mikhail Galkin
Christopher Morris
OOD
97
2
0
21 Feb 2025
Trustworthy AI on Safety, Bias, and Privacy: A Survey
Xingli Fang
Jianwei Li
Varun Mulchandani
Jung-Eun Kim
40
0
0
11 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
90
0
0
04 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari
Aviral Gupta
Michael Hahn
66
0
0
03 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
70
4
0
03 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
75
4
0
31 Dec 2024
Cross-table Synthetic Tabular Data Detection
G. C. N. Kindji
L. Rojas-Barahona
Elisa Fromont
Tanguy Urvoy
LMTD
75
0
0
17 Dec 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang
Xiangyu Chang
Mingchen Li
A. Roy-Chowdhury
J. Chen
Samet Oymak
60
2
0
19 Nov 2024
Quantifying artificial intelligence through algebraic generalization
Takuya Ito
Murray Campbell
L. Horesh
Tim Klinger
Parikshit Ram
ELM
46
0
0
08 Nov 2024
Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
41
2
0
06 Nov 2024
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
Mufei Li
Viraj Shitole
Eli Chien
Changhai Man
Zhaodong Wang
Srinivas Sridharan
Ying Zhang
Tushar Krishna
P. Li
30
0
0
04 Nov 2024
Provable Length Generalization in Sequence Prediction via Spectral Filtering
Annie Marsden
Evan Dogariu
Naman Agarwal
Xinyi Chen
Daniel Suo
Elad Hazan
32
1
0
01 Nov 2024
Transformers to Predict the Applicability of Symbolic Integration Routines
Rashid Barket
Uzma Shafiq
Matthew England
Juergen Gerhard
16
0
0
31 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
26
3
0
24 Oct 2024
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
32
1
0
17 Oct 2024
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Z. Li
Liwei Wang
LRM
30
5
0
17 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
16
3
0
17 Oct 2024
How much do contextualized representations encode long-range context?
Simeng Sun
Cheng-Ping Hsieh
39
0
0
16 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
77
18
0
15 Oct 2024
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Yang Chen
Yitao Liang
Zhouchen Lin
13
1
0
11 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMat
LRM
58
124
0
07 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
26
1
0
07 Oct 2024
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
46
8
0
03 Oct 2024
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
15
3
0
02 Oct 2024
Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia
Zhejian Zhou
Jiayu Wang
Dahua Lin
Kai Chen
LRM
21
2
0
25 Sep 2024
Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
Anna Mészáros
Szilvia Ujváry
Wieland Brendel
Patrik Reizinger
Ferenc Huszár
24
0
0
09 Sep 2024
On the Design Space Between Transformers and Recursive Neural Nets
Jishnu Ray Chowdhury
Cornelia Caragea
14
0
0
03 Sep 2024
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
M. Russak
Umar Jamil
Christopher Bryant
Kiran Kamble
Axel Magnuson
Mateusz Russak
Waseem Alshikh
14
2
0
27 Aug 2024
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi
Sunny Panchal
Roland Memisevic
25
5
0
10 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
36
6
0
08 Aug 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
56
1
0
15 Jul 2024
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
32
7
0
03 Jul 2024
MatText: Do Language Models Need More than Text & Scale for Materials Modeling?
Nawaf Alampara
Santiago Miret
K. Jablonka
43
8
0
25 Jun 2024
FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation
Bangzheng Li
Ben Zhou
Xingyu Fu
Fei Wang
Dan Roth
Muhao Chen
21
0
0
17 Jun 2024
1
2
Next