ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06981
  4. Cited By
Thinking Like Transformers

Thinking Like Transformers

13 June 2021
Gail Weiss
Yoav Goldberg
Eran Yahav
    AI4CE
ArXivPDFHTML

Papers citing "Thinking Like Transformers"

50 / 109 papers shown
Title
A Transformer with Stack Attention
A Transformer with Stack Attention
Jiaoda Li
Jennifer C. White
Mrinmaya Sachan
Ryan Cotterell
30
2
0
07 May 2024
A Philosophical Introduction to Language Models - Part II: The Way
  Forward
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
60
13
0
06 May 2024
Transformers Can Represent $n$-gram Language Models
Transformers Can Represent nnn-gram Language Models
Anej Svete
Ryan Cotterell
32
17
0
23 Apr 2024
On the Empirical Complexity of Reasoning and Planning in LLMs
On the Empirical Complexity of Reasoning and Planning in LLMs
Liwei Kang
Zirui Zhao
David Hsu
Wee Sun Lee
LRM
22
5
0
17 Apr 2024
Counting Like Transformers: Compiling Temporal Counting Logic Into
  Softmax Transformers
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang
David Chiang
31
7
0
05 Apr 2024
Task Agnostic Architecture for Algorithm Induction via Implicit
  Composition
Task Agnostic Architecture for Algorithm Induction via Implicit Composition
Sahil J. Sindhi
Ignas Budvytis
24
0
0
03 Apr 2024
Transformers as Transducers
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
27
4
0
02 Apr 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence
  Learning Tasks
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen
Difan Zou
ViT
24
12
0
02 Apr 2024
Towards a theory of model distillation
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
44
6
0
14 Mar 2024
Simulating Weighted Automata over Sequences and Trees with Transformers
Simulating Weighted Automata over Sequences and Trees with Transformers
Michael Rizvi
M. Lizaire
Clara Lacroce
Guillaume Rabusseau
AI4CE
45
0
0
12 Mar 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial
  Problems
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
25
95
0
20 Feb 2024
Discrete Neural Algorithmic Reasoning
Discrete Neural Algorithmic Reasoning
Gleb Rodionov
Liudmila Prokhorenkova
OOD
NAI
37
3
0
18 Feb 2024
Why are Sensitive Functions Hard for Transformers?
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
27
23
0
15 Feb 2024
Limits of Transformer Language Models on Learning to Compose Algorithms
Limits of Transformer Language Models on Learning to Compose Algorithms
Jonathan Thomm
Aleksandar Terzić
Giacomo Camposampiero
Michael Hersche
Bernhard Schölkopf
Abbas Rahimi
36
3
0
08 Feb 2024
On Provable Length and Compositional Generalization
On Provable Length and Compositional Generalization
Kartik Ahuja
Amin Mansouri
OODD
33
7
0
07 Feb 2024
A phase transition between positional and semantic learning in a
  solvable model of dot-product attention
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
25
11
0
06 Feb 2024
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Fangru Lin
Emanuele La Malfa
Valentin Hofmann
Elle Michelle Yang
Anthony Cohn
J. Pierrehumbert
LRM
48
16
0
05 Feb 2024
Extracting Formulae in Many-Valued Logic from Deep Neural Networks
Extracting Formulae in Many-Valued Logic from Deep Neural Networks
Yani Zhang
Helmut Bölcskei
24
0
0
22 Jan 2024
Code Simulation Challenges for Large Language Models
Code Simulation Challenges for Large Language Models
Emanuele La Malfa
Christoph Weinhuber
Orazio Torre
Fangru Lin
Samuele Marro
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
LLMAG
LRM
17
8
0
17 Jan 2024
Carrying over algorithm in transformers
Carrying over algorithm in transformers
J. Kruthoff
24
0
0
15 Jan 2024
Interpretability Illusions in the Generalization of Simplified Models
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
17
14
0
06 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with
  bounded Dyck grammars
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
21
21
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
31
6
0
21 Nov 2023
Banach-Tarski Embeddings and Transformers
Banach-Tarski Embeddings and Transformers
Joshua Maher
11
0
0
15 Nov 2023
What Formal Languages Can Transformers Express? A Survey
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
20
46
0
01 Nov 2023
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and
  Limitations
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
Aleksandar Petrov
Philip H. S. Torr
Adel Bibi
VPVLM
25
21
0
30 Oct 2023
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
29
4
0
26 Oct 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
32
109
0
24 Oct 2023
Practical Computational Power of Linear Transformers and Their Recurrent
  and Self-Referential Extensions
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
28
11
0
24 Oct 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free
  Languages
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Andy Yang
David Chiang
Dana Angluin
28
14
0
21 Oct 2023
The Expressive Power of Transformers with Chain of Thought
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRM
AI4CE
ReLM
19
0
0
11 Oct 2023
Logical Languages Accepted by Transformer Encoders with Hard Attention
Logical Languages Accepted by Transformer Encoders with Hard Attention
Pablo Barceló
A. Kozachinskiy
A. W. Lin
Vladimir Podolskii
25
15
0
05 Oct 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
13
0
31 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
27
22
0
21 Jul 2023
Trainable Transformer in Transformer
Trainable Transformer in Transformer
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
27
12
0
03 Jul 2023
Learning Transformer Programs
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
28
32
0
01 Jun 2023
Applying language models to algebraic topology: generating simplicial
  cycles using multi-labeling in Wu's formula
Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula
Kirill Brilliantov
Fedor Pavutnitskiy
D. Pasechnyuk
German Magai
22
0
0
01 Jun 2023
The Impact of Positional Encoding on Length Generalization in
  Transformers
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad
Inkit Padhi
K. Ramamurthy
Payel Das
Siva Reddy
19
177
0
31 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
27
214
0
24 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
13
276
0
28 Apr 2023
Mastering Symbolic Operations: Augmenting Language Models with Compiled
  Neural Networks
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Yixuan Weng
Minjun Zhu
Fei Xia
Bin Li
Shizhu He
Kang Liu
Jun Zhao
28
4
0
04 Apr 2023
Diffusing Graph Attention
Diffusing Graph Attention
Daniel Glickman
Eran Yahav
GNN
38
3
0
01 Mar 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn
  Group Operations
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
13
96
0
06 Feb 2023
Looped Transformers as Programmable Computers
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
10
94
0
30 Jan 2023
Tighter Bounds on the Expressivity of Transformer Encoders
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
27
53
0
25 Jan 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
27
70
0
12 Jan 2023
When Can Transformers Ground and Compose: Insights from Compositional
  Generalization Benchmarks
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
Ankur Sikarwar
Arkil Patel
Navin Goyal
ViT
23
10
0
23 Oct 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
32
155
0
19 Oct 2022
A Logic for Expressing Log-Precision Transformers
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLM
NAI
LRM
48
46
0
06 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
21
447
0
01 Aug 2022
Previous
123
Next