Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2006.09286
Cited By
v1
v2
v3 (latest)
On the Computational Power of Transformers and its Implications in Sequence Modeling
16 June 2020
S. Bhattamishra
Arkil Patel
Navin Goyal
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Computational Power of Transformers and its Implications in Sequence Modeling"
50 / 67 papers shown
Exact Learning of Arithmetic with Differentiable Agents
Hristo Papazov
Francesco DÁngelo
Nicolas Flammarion
122
0
0
27 Nov 2025
Softmax Transformers are Turing-Complete
Hongjian Jiang
Michael Hahn
Georg Zetzsche
Anthony Widjaja Lin
LRM
212
5
0
25 Nov 2025
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Hyundong Jin
Joonghyuk Hahn
Yo-Sub Han
LRM
121
0
0
10 Oct 2025
The Role of Logic and Automata in Understanding Transformers
Anthony Widjaja Lin
Pablo Barcelo
AI4CE
140
0
0
28 Sep 2025
Efficient Turing Machine Simulation with Transformers
Qian Li
Yuyi Wang
LRM
186
1
0
28 Sep 2025
Is In-Context Learning Learning?
Adrian de Wynter
265
1
0
12 Sep 2025
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Ivan Rodkin
Daniil Orel
Konstantin Smirnov
Arman Bolatov
Bilal Elbouardi
...
Aydar Bulatov
Preslav Nakov
Timothy Baldwin
Artem Shelmanov
Mikhail Burtsev
ReLM
ELM
LRM
282
2
0
22 Aug 2025
Sequential-Parallel Duality in Prefix Scannable Models
Morris Yau
Sharut Gupta
Valerie Engelmayer
Kazuki Irie
Stefanie Jegelka
Jacob Andreas
481
6
0
12 Jun 2025
Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques
Asankhaya Sharma
198
1
0
09 Jun 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Baihe Huang
Shanda Li
Tianhao Wu
Yiming Yang
Ameet Talwalkar
Kannan Ramchandran
Michael I. Jordan
Jiantao Jiao
LRM
409
4
0
05 Jun 2025
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
OffRL
LRM
495
45
0
18 May 2025
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Hector Pasten
Felipe Urrutia
Hector Jimenez
Cristian B. Calderon
Cristóbal Rojas
Chris Köcher
469
1
0
15 May 2025
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework
Yuan Xia
Akanksha Atrey
Fadoua Khmaissia
Kedar S. Namjoshi
LRM
ELM
324
2
0
28 Apr 2025
Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability
Chenhui Xu
Dancheng Liu
Jiajie Li
Amir Nassereldine
Zhaohui Li
Jinjun Xiong
LRM
366
0
0
05 Mar 2025
Ask, and it shall be given: On the Turing completeness of prompting
International Conference on Learning Representations (ICLR), 2024
Ruizhong Qiu
Zhe Xu
Wenxuan Bao
Hanghang Tong
ReLM
LRM
AI4CE
565
0
0
24 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
323
3
0
09 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
1.4K
24
0
04 Feb 2025
Learning Elementary Cellular Automata with Transformers
Mikhail Burtsev
467
2
0
02 Dec 2024
Training Neural Networks as Recognizers of Formal Languages
International Conference on Learning Representations (ICLR), 2024
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Robert Bamler
Brian DuSell
NAI
687
20
0
11 Nov 2024
Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans
Hanjun Dai
Francesco Zanini
279
17
0
04 Oct 2024
Transformers As Approximations of Solomonoff Induction
International Conference on Neural Information Processing (ICONIP), 2024
Nathan Young
Michael Witbrock
138
4
0
22 Aug 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
476
2
0
15 Jul 2024
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
Wenhui Zhu
Xiwen Chen
Peijie Qiu
Aristeidis Sotiras
Abolfazl Razi
Yalin Wang
324
37
0
04 Jul 2024
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
279
22
0
03 Jul 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Robert Bamler
ReLM
LRM
449
27
0
20 Jun 2024
[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs
Abhinav Rao
Monojit Choudhury
Somak Aditya
287
2
0
18 Jun 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
381
27
0
13 Jun 2024
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
Ancheng Xu
Minghuan Tan
Lei Wang
Min Yang
Ruifeng Xu
LRM
215
1
0
05 Jun 2024
Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning
Marco Sälzer
Eric Alsmann
Martin Lange
LRM
281
0
0
28 May 2024
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
477
9
0
27 May 2024
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRM
BDL
446
20
0
26 May 2024
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
578
7
0
24 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
330
30
0
07 May 2024
Do Large Language Models Learn Human-Like Strategic Preferences?
Jesse Roberts
Kyle Moore
Douglas H. Fisher
209
8
0
11 Apr 2024
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
Dylan Zhang
Curt Tigges
Zory Zhang
Stella Biderman
Maxim Raginsky
Talia Ringer
285
20
0
23 Jan 2024
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Neural Information Processing Systems (NeurIPS), 2023
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
328
28
0
03 Dec 2023
What Formal Languages Can Transformers Express? A Survey
Transactions of the Association for Computational Linguistics (TACL), 2023
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
558
113
0
01 Nov 2023
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
355
24
0
24 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
International Conference on Learning Representations (ICLR), 2023
Licong Lin
Yu Bai
Song Mei
OffRL
391
73
0
12 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
International Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
461
48
0
01 Oct 2023
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Shunjie Wang
Shane Steinert-Threlkeld
350
4
0
02 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Neural Information Processing Systems (NeurIPS), 2023
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
237
41
0
21 Jul 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
308
28
0
12 Jul 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
543
329
0
16 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Wanrong Zhu
327
30
0
04 Jun 2023
How Powerful are Decoder-Only Transformer Neural Models?
IEEE International Joint Conference on Neural Network (IJCNN), 2023
Jesse Roberts
BDL
262
31
0
26 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Neural Information Processing Systems (NeurIPS), 2023
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
602
112
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Neural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
771
411
0
24 May 2023
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ta-Chung Chi
Ting-Han Fan
Li-Wei Chen
Alexander I. Rudnicky
Peter J. Ramadge
VLM
MILM
216
23
0
23 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Neural Information Processing Systems (NeurIPS), 2023
Shuai Li
Zhao Song
Yu Xia
Tong Yu
Wanrong Zhu
256
50
0
26 Apr 2023
1
2
Next
Page 1 of 2