Explicitly Encoding Structural Symmetry is Key to Length Generalization
in Arithmetic Tasks

Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks

4 June 2024

Papers citing "Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks"

9 / 9 papers shown

Title
The Role of Sparsity for Length Generalization in Transformers Noah Golowich Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 37 0 0 24 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Alireza Amiri Xinting Huang Mark Rofin Michael Hahn LRM 90 0 0 04 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges Nayoung Lee Ziyang Cai Avi Schwarzschild Kangwook Lee Dimitris Papailiopoulos ReLM VLM LRM AI4CE 70 4 0 03 Feb 2025
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation Chuanyang Zheng Yihang Gao Han Shi Jing Xiong Jiankai Sun ... Xiaozhe Ren Michael Ng Xin Jiang Zhenguo Li Yu Li 26 1 0 07 Oct 2024
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks Sadegh Mahdavi Kevin Swersky Thomas Kipf Milad Hashemi Christos Thrampoulidis Renjie Liao LRM OOD NAI 40 26 0 01 Nov 2022
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks Yuxuan Li James L. McClelland 21 17 0 02 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 237 690 0 27 Aug 2021
A Decomposable Attention Model for Natural Language Inference Ankur P. Parikh Oscar Täckström Dipanjan Das Jakob Uszkoreit 190 1,358 0 06 Jun 2016