Transformers Learn Shortcuts to Automata

19 October 2022

Papers citing "Transformers Learn Shortcuts to Automata"

35 / 35 papers shown

Title
Partial Answer of How Transformers Learn Automata Tiantian 22 0 0 29 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 39 0 0 29 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers William Merrill Ashish Sabharwal 48 4 0 05 Mar 2025
(How) Do Language Models Track State? Belinda Z. Li Zifan Carl Guo Jacob Andreas LRM 44 0 0 04 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song Yufa Zhou 89 18 0 21 Feb 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Andreas Opedal Haruki Shirakami Bernhard Schölkopf Abulhair Saparov Mrinmaya Sachan LRM 54 1 0 17 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Alireza Amiri Xinting Huang Mark Rofin Michael Hahn LRM 90 0 0 04 Feb 2025
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 111 3 0 29 Dec 2024
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues Riccardo Grazzi Julien N. Siems Jörg K.H. Franke Arber Zela Frank Hutter Massimiliano Pontil 84 10 0 19 Nov 2024
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 33 2 0 11 Nov 2024
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence İlker Işık R. G. Cinbis Ebru Aydin Gol 26 0 0 22 Oct 2024
Can Transformers Reason Logically? A Study in SAT Solving Leyan Pan Vijay Ganesh Jacob Abernethy Chris Esposo Wenke Lee ReLM LRM 26 0 0 09 Oct 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Zayne Sprague Fangcong Yin Juan Diego Rodriguez Dongwei Jiang Manya Wadhwa Prasann Singhal Xinyu Zhao Xi Ye Kyle Mahowald Greg Durrett ReLM LRM 111 79 0 18 Sep 2024
LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning Lekai Chen Ashutosh Trivedi Alvaro Velasquez 16 0 0 06 Aug 2024
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 56 1 0 15 Jul 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference Anton Xue Avishree Khare Rajeev Alur Surbhi Goel Eric Wong 43 2 0 21 Jun 2024
U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models Song Mei 3DV AI4CE DiffM 31 11 0 29 Apr 2024
The Illusion of State in State-Space Models William Merrill Jackson Petty Ashish Sabharwal 46 43 0 12 Apr 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 34 1 0 01 Feb 2024
An Information-Theoretic Analysis of In-Context Learning Hong Jun Jeon Jason D. Lee Qi Lei Benjamin Van Roy 13 18 0 28 Jan 2024
Learning Universal Predictors Jordi Grau-Moya Tim Genewein Marcus Hutter Laurent Orseau Grégoire Delétang ... Anian Ruoss Wenliang Kevin Li Christopher Mattern Matthew Aitchison J. Veness 19 11 0 26 Jan 2024
Extracting Formulae in Many-Valued Logic from Deep Neural Networks Yani Zhang Helmut Bölcskei 19 0 0 22 Jan 2024
On The Expressivity of Recurrent Neural Cascades Nadezda A. Knorozova Alessandro Ronca 18 1 0 14 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 22 6 0 21 Nov 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining Licong Lin Yu Bai Song Mei OffRL 27 42 0 12 Oct 2023
Schema-learning and rebinding as mechanisms of in-context learning and emergence Siva K. Swaminathan Antoine Dedieu Rajkumar Vasudeva Raju Murray Shanahan Miguel Lazaro-Gredilla Dileep George 21 8 0 16 Jun 2023
Faith and Fate: Limits of Transformers on Compositionality Nouha Dziri Ximing Lu Melanie Sclar Xiang Lorraine Li Liwei Jian ... Sean Welleck Xiang Ren Allyson Ettinger Zaïd Harchaoui Yejin Choi ReLM LRM 28 324 0 29 May 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit Boaz Barak Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 25 122 0 18 Jul 2022
Neural Networks and the Chomsky Hierarchy Grégoire Delétang Anian Ruoss Jordi Grau-Moya Tim Genewein L. Wenliang ... Chris Cundy Marcus Hutter Shane Legg Joel Veness Pedro A. Ortega UQCV 94 129 0 05 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 237 690 0 27 Aug 2021
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges M. Bronstein Joan Bruna Taco S. Cohen Petar Velivcković GNN 166 1,095 0 27 Apr 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,724 0 26 Sep 2016
Benefits of depth in neural networks Matus Telgarsky 123 600 0 14 Feb 2016