Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06981
Cited By
Thinking Like Transformers
13 June 2021
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Thinking Like Transformers"
50 / 109 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
25
0
0
13 May 2025
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
38
0
0
21 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
41
0
0
29 Mar 2025
Function Alignment: A New Theory of Mind and Intelligence, Part I: Foundations
Gus G. Xia
39
0
0
27 Mar 2025
Meta-Learning Neural Mechanisms rather than Bayesian Priors
Michael Goodale
Salvador Mascarenhas
Yair Lakretz
38
0
0
20 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
64
1
0
26 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
37
0
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
91
18
0
21 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari
Aviral Gupta
Michael Hahn
132
0
0
03 Feb 2025
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
71
1
0
26 Nov 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Xin Ma
Yang Liu
J. Liu
Xiaoxu Ma
16
1
0
21 Oct 2024
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Z. Li
Liwei Wang
LRM
35
5
0
17 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
27
4
0
17 Oct 2024
Hypothesis Testing the Circuit Hypothesis in LLMs
Claudia Shi
Nicolas Beltran-Velez
Achille Nazaret
Carolina Zheng
Adrià Garriga-Alonso
Andrew Jesson
Maggie Makar
David M. Blei
37
6
0
16 Oct 2024
Learning Linear Attention in Polynomial Time
Morris Yau
Ekin Akyürek
Jiayuan Mao
Joshua B. Tenenbaum
Stefanie Jegelka
Jacob Andreas
17
2
0
14 Oct 2024
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan
Vijay Ganesh
Jacob Abernethy
Chris Esposo
Wenke Lee
ReLM
LRM
31
0
0
09 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMat
LRM
58
129
0
07 Oct 2024
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
21
9
0
07 Oct 2024
Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans
Hanjun Dai
Francesco Zanini
33
2
0
04 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
15
7
0
03 Oct 2024
Bayes' Power for Explaining In-Context Learning Generalizations
Samuel G. Müller
Noah Hollmann
Frank Hutter
BDL
39
1
0
02 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Philipp Mondorf
Sondre Wold
Barbara Plank
34
0
0
02 Oct 2024
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
36
3
0
02 Oct 2024
Neural Decompiling of Tracr Transformers
Hannes Thurnherr
Kaspar Riesen
ViT
23
1
0
29 Sep 2024
Towards Narrowing the Generalization Gap in Deep Boolean Networks
Youngsung Kim
NAI
AI4CE
28
0
0
06 Sep 2024
How transformers learn structured data: insights from hierarchical filtering
Jerome Garnier-Brun
Marc Mézard
Emanuele Moscato
Luca Saglietti
24
5
0
27 Aug 2024
Learning Randomized Algorithms with Transformers
J. Oswald
Seijin Kobayashi
Yassir Akram
Angelika Steger
AAML
40
0
0
20 Aug 2024
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi
Sunny Panchal
Roland Memisevic
33
5
0
10 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
41
6
0
08 Aug 2024
Transformers on Markov Data: Constant Depth Suffices
Nived Rajaraman
Marco Bondaschi
Kannan Ramchandran
Michael C. Gastpar
Ashok Vardhan Makkuva
31
4
0
25 Jul 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta
Iván Arcuschin
Thomas Kwa
Adrià Garriga-Alonso
47
3
0
19 Jul 2024
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach
Nils Palumbo
Ravi Mangal
Zifan Wang
Saranya Vijayakumar
Corina S. Pasareanu
Somesh Jha
41
1
0
18 Jul 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
61
1
0
15 Jul 2024
Transformer Circuit Faithfulness Metrics are not Robust
Joseph Miller
Bilal Chughtai
William Saunders
45
7
0
11 Jul 2024
Algorithmic Language Models with Neurally Compiled Libraries
Lucas Saldyt
Subbarao Kambhampati
LRM
51
0
0
06 Jul 2024
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
42
7
0
03 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
Memory
3
\text{Memory}^3
Memory
3
: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
48
11
0
01 Jul 2024
Revisiting Random Walks for Learning on Graphs
Jinwoo Kim
Olga Zaghen
Ayhan Suleymanzade
Youngmin Ryou
Seunghoon Hong
54
0
0
01 Jul 2024
MatText: Do Language Models Need More than Text & Scale for Materials Modeling?
Nawaf Alampara
Santiago Miret
K. Jablonka
48
8
0
25 Jun 2024
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
58
16
0
24 Jun 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
36
9
0
13 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans
Ludvig Ericson
Patric Jensfelt
34
7
0
13 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models
Aleksandar Petrov
Tom A. Lamb
Alasdair Paren
Philip H. S. Torr
Adel Bibi
LRM
32
0
0
03 Jun 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Andis Draguns
Andrew Gritsevskiy
S. Motwani
Charlie Rogers-Smith
Jeffrey Ladish
Christian Schroeder de Witt
40
2
0
03 Jun 2024
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
Charles OÑeill
Thang Bui
30
5
0
21 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
44
7
0
09 May 2024
1
2
3
Next