Thinking Like Transformers

13 June 2021

Papers citing "Thinking Like Transformers"

50 / 109 papers shown

Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally Tobias Schnabel Kiran Tomlinson Adith Swaminathan Jennifer Neville LRM 25 0 0 13 May 2025
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP) William Bruns 38 0 0 21 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 41 0 0 29 Mar 2025
Function Alignment: A New Theory of Mind and Intelligence, Part I: Foundations Gus G. Xia 39 0 0 27 Mar 2025
Meta-Learning Neural Mechanisms rather than Bayesian Priors Michael Goodale Salvador Mascarenhas Yair Lakretz 38 0 0 20 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More Arvid Frydenlund LRM 48 0 0 13 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases Michael Y. Hu Jackson Petty Chuan Shi William Merrill Tal Linzen AI4CE 64 1 0 26 Feb 2025
The Role of Sparsity for Length Generalization in Transformers Noah Golowich Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 37 0 0 24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song Yufa Zhou 91 18 0 21 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers Utkarsh Tiwari Aviral Gupta Michael Hahn 132 0 0 03 Feb 2025
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models Yunzhe Hu Difan Zou Dong Xu 71 1 0 26 Nov 2024
Mixture of Parrots: Experts improve memorization more than reasoning Samy Jelassi Clara Mohri David Brandfonbrener Alex Gu Nikhil Vyas Nikhil Anand David Alvarez-Melis Yuanzhi Li Sham Kakade Eran Malach MoE 28 4 0 24 Oct 2024
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs Xin Ma Yang Liu J. Liu Xiaoxu Ma 16 1 0 21 Oct 2024
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs Guhao Feng Kai-Bo Yang Yuntian Gu Xinyue Ai Shengjie Luo Jiacheng Sun Di He Z. Li Liwei Wang LRM 35 5 0 17 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models Arvid Frydenlund LRM 27 4 0 17 Oct 2024
Hypothesis Testing the Circuit Hypothesis in LLMs Claudia Shi Nicolas Beltran-Velez Achille Nazaret Carolina Zheng Adrià Garriga-Alonso Andrew Jesson Maggie Makar David M. Blei 37 6 0 16 Oct 2024
Learning Linear Attention in Polynomial Time Morris Yau Ekin Akyürek Jiayuan Mao Joshua B. Tenenbaum Stefanie Jegelka Jacob Andreas 17 2 0 14 Oct 2024
Can Transformers Reason Logically? A Study in SAT Solving Leyan Pan Vijay Ganesh Jacob Abernethy Chris Esposo Wenke Lee ReLM LRM 31 0 0 09 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Iman Mirzadeh Keivan Alizadeh Hooman Shahrokhi Oncel Tuzel Samy Bengio Mehrdad Farajtabar AIMat LRM 58 129 0 07 Oct 2024
Mechanistic? Naomi Saphra Sarah Wiegreffe AI4CE 21 9 0 07 Oct 2024
Autoregressive Large Language Models are Computationally Universal Dale Schuurmans Hanjun Dai Francesco Zanini 33 2 0 04 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient George Wang Jesse Hoogland Stan van Wingerden Zach Furman Daniel Murfet OffRL 15 7 0 03 Oct 2024
Bayes' Power for Explaining In-Context Learning Generalizations Samuel G. Müller Noah Hollmann Frank Hutter BDL 39 1 0 02 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models Philipp Mondorf Sondre Wold Barbara Plank 34 0 0 02 Oct 2024
ENTP: Encoder-only Next Token Prediction Ethan Ewer Daewon Chae Thomas Zeng Jinkyu Kim Kangwook Lee 36 3 0 02 Oct 2024
Neural Decompiling of Tracr Transformers Hannes Thurnherr Kaspar Riesen ViT 23 1 0 29 Sep 2024
Towards Narrowing the Generalization Gap in Deep Boolean Networks Youngsung Kim NAI AI4CE 28 0 0 06 Sep 2024
How transformers learn structured data: insights from hierarchical filtering Jerome Garnier-Brun Marc Mézard Emanuele Moscato Luca Saglietti 24 5 0 27 Aug 2024
Learning Randomized Algorithms with Transformers J. Oswald Seijin Kobayashi Yassir Akram Angelika Steger AAML 40 0 0 20 Aug 2024
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers MohammadReza Ebrahimi Sunny Panchal Roland Memisevic 33 5 0 10 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression Xingwu Chen Lei Zhao Difan Zou 41 6 0 08 Aug 2024
Transformers on Markov Data: Constant Depth Suffices Nived Rajaraman Marco Bondaschi Kannan Ramchandran Michael C. Gastpar Ashok Vardhan Makkuva 31 4 0 25 Jul 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques Rohan Gupta Iván Arcuschin Thomas Kwa Adrià Garriga-Alonso 47 3 0 19 Jul 2024
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach Nils Palumbo Ravi Mangal Zifan Wang Saranya Vijayakumar Corina S. Pasareanu Somesh Jha 41 1 0 18 Jul 2024
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 61 1 0 15 Jul 2024
Transformer Circuit Faithfulness Metrics are not Robust Joseph Miller Bilal Chughtai William Saunders 45 7 0 11 Jul 2024
Algorithmic Language Models with Neurally Compiled Libraries Lucas Saldyt Subbarao Kambhampati LRM 51 0 0 06 Jul 2024
Universal Length Generalization with Turing Programs Kaiying Hou David Brandfonbrener Sham Kakade Samy Jelassi Eran Malach 42 7 0 03 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 75 19 0 02 Jul 2024
$$\text{Memory}^3$: Language Modeling with Explicit Memory$ $\text{Memory}^3$ : Language Modeling with Explicit Memory Hongkang Yang Zehao Lin Wenjin Wang Hao Wu Zhiyu Li ... Yu Yu Kai Chen Feiyu Xiong Linpeng Tang Weinan E 48 11 0 01 Jul 2024
Revisiting Random Walks for Learning on Graphs Jinwoo Kim Olga Zaghen Ayhan Suleymanzade Youngmin Ryou Seunghoon Hong 54 0 0 01 Jul 2024
MatText: Do Language Models Need More than Text & Scale for Materials Modeling? Nawaf Alampara Santiago Miret K. Jablonka 48 8 0 25 Jun 2024
Finding Transformer Circuits with Edge Pruning Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen 58 16 0 24 Jun 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures S. Bhattamishra Michael Hahn Phil Blunsom Varun Kanade GNN 36 9 0 13 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans Ludvig Ericson Patric Jensfelt 34 7 0 13 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models Aleksandar Petrov Tom A. Lamb Alasdair Paren Philip H. S. Torr Adel Bibi LRM 32 0 0 03 Jun 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits Andis Draguns Andrew Gritsevskiy S. Motwani Charlie Rogers-Smith Jeffrey Ladish Christian Schroeder de Witt 40 2 0 03 Jun 2024
Language Models Need Inductive Biases to Count Inductively Yingshan Chang Yonatan Bisk LRM 32 5 0 30 May 2024
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models Charles OÑeill Thang Bui 30 5 0 21 May 2024
Natural Language Processing RELIES on Linguistics Juri Opitz Shira Wein Nathan Schneider AI4CE 44 7 0 09 May 2024