Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06981
Cited By
Thinking Like Transformers
13 June 2021
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Thinking Like Transformers"
50 / 109 papers shown
Title
A Transformer with Stack Attention
Jiaoda Li
Jennifer C. White
Mrinmaya Sachan
Ryan Cotterell
30
2
0
07 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
60
13
0
06 May 2024
Transformers Can Represent
n
n
n
-gram Language Models
Anej Svete
Ryan Cotterell
32
17
0
23 Apr 2024
On the Empirical Complexity of Reasoning and Planning in LLMs
Liwei Kang
Zirui Zhao
David Hsu
Wee Sun Lee
LRM
22
5
0
17 Apr 2024
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang
David Chiang
31
7
0
05 Apr 2024
Task Agnostic Architecture for Algorithm Induction via Implicit Composition
Sahil J. Sindhi
Ignas Budvytis
24
0
0
03 Apr 2024
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
27
4
0
02 Apr 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen
Difan Zou
ViT
24
12
0
02 Apr 2024
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
44
6
0
14 Mar 2024
Simulating Weighted Automata over Sequences and Trees with Transformers
Michael Rizvi
M. Lizaire
Clara Lacroce
Guillaume Rabusseau
AI4CE
45
0
0
12 Mar 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
25
95
0
20 Feb 2024
Discrete Neural Algorithmic Reasoning
Gleb Rodionov
Liudmila Prokhorenkova
OOD
NAI
37
3
0
18 Feb 2024
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
27
23
0
15 Feb 2024
Limits of Transformer Language Models on Learning to Compose Algorithms
Jonathan Thomm
Aleksandar Terzić
Giacomo Camposampiero
Michael Hersche
Bernhard Schölkopf
Abbas Rahimi
36
3
0
08 Feb 2024
On Provable Length and Compositional Generalization
Kartik Ahuja
Amin Mansouri
OODD
33
7
0
07 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
25
11
0
06 Feb 2024
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Fangru Lin
Emanuele La Malfa
Valentin Hofmann
Elle Michelle Yang
Anthony Cohn
J. Pierrehumbert
LRM
48
16
0
05 Feb 2024
Extracting Formulae in Many-Valued Logic from Deep Neural Networks
Yani Zhang
Helmut Bölcskei
24
0
0
22 Jan 2024
Code Simulation Challenges for Large Language Models
Emanuele La Malfa
Christoph Weinhuber
Orazio Torre
Fangru Lin
Samuele Marro
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
LLMAG
LRM
17
8
0
17 Jan 2024
Carrying over algorithm in transformers
J. Kruthoff
24
0
0
15 Jan 2024
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
17
14
0
06 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
21
21
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
31
6
0
21 Nov 2023
Banach-Tarski Embeddings and Transformers
Joshua Maher
11
0
0
15 Nov 2023
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
20
46
0
01 Nov 2023
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
Aleksandar Petrov
Philip H. S. Torr
Adel Bibi
VPVLM
25
21
0
30 Oct 2023
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
29
4
0
26 Oct 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
32
109
0
24 Oct 2023
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
28
11
0
24 Oct 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Andy Yang
David Chiang
Dana Angluin
28
14
0
21 Oct 2023
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRM
AI4CE
ReLM
19
0
0
11 Oct 2023
Logical Languages Accepted by Transformer Encoders with Hard Attention
Pablo Barceló
A. Kozachinskiy
A. W. Lin
Vladimir Podolskii
25
15
0
05 Oct 2023
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
13
0
31 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
27
22
0
21 Jul 2023
Trainable Transformer in Transformer
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
27
12
0
03 Jul 2023
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
28
32
0
01 Jun 2023
Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula
Kirill Brilliantov
Fedor Pavutnitskiy
D. Pasechnyuk
German Magai
22
0
0
01 Jun 2023
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad
Inkit Padhi
K. Ramamurthy
Payel Das
Siva Reddy
19
177
0
31 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
27
214
0
24 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
13
276
0
28 Apr 2023
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Yixuan Weng
Minjun Zhu
Fei Xia
Bin Li
Shizhu He
Kang Liu
Jun Zhao
28
4
0
04 Apr 2023
Diffusing Graph Attention
Daniel Glickman
Eran Yahav
GNN
38
3
0
01 Mar 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
13
96
0
06 Feb 2023
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
10
94
0
30 Jan 2023
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
27
53
0
25 Jan 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
27
70
0
12 Jan 2023
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
Ankur Sikarwar
Arkil Patel
Navin Goyal
ViT
23
10
0
23 Oct 2022
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
32
155
0
19 Oct 2022
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLM
NAI
LRM
48
46
0
06 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
21
447
0
01 Aug 2022
Previous
1
2
3
Next