Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2006.09286
Cited By
v1
v2
v3 (latest)
On the Computational Power of Transformers and its Implications in Sequence Modeling
16 June 2020
S. Bhattamishra
Arkil Patel
Navin Goyal
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Computational Power of Transformers and its Implications in Sequence Modeling"
50 / 67 papers shown
Exact Learning of Arithmetic with Differentiable Agents
Hristo Papazov
Francesco DÁngelo
Nicolas Flammarion
118
0
0
27 Nov 2025
Softmax Transformers are Turing-Complete
Hongjian Jiang
Michael Hahn
Georg Zetzsche
Anthony Widjaja Lin
LRM
208
5
0
25 Nov 2025
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Hyundong Jin
Joonghyuk Hahn
Yo-Sub Han
LRM
120
0
0
10 Oct 2025
The Role of Logic and Automata in Understanding Transformers
Anthony Widjaja Lin
Pablo Barcelo
AI4CE
134
0
0
28 Sep 2025
Efficient Turing Machine Simulation with Transformers
Qian Li
Yuyi Wang
LRM
175
0
0
28 Sep 2025
Is In-Context Learning Learning?
Adrian de Wynter
248
1
0
12 Sep 2025
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Ivan Rodkin
Daniil Orel
Konstantin Smirnov
Arman Bolatov
Bilal Elbouardi
...
Aydar Bulatov
Preslav Nakov
Timothy Baldwin
Artem Shelmanov
Mikhail Burtsev
LRM
274
0
0
22 Aug 2025
Sequential-Parallel Duality in Prefix Scannable Models
Morris Yau
Sharut Gupta
Valerie Engelmayer
Kazuki Irie
Stefanie Jegelka
Jacob Andreas
467
5
0
12 Jun 2025
Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques
Asankhaya Sharma
193
1
0
09 Jun 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Baihe Huang
Shanda Li
Tianhao Wu
Yiming Yang
Ameet Talwalkar
Kannan Ramchandran
Michael I. Jordan
Jiantao Jiao
LRM
398
4
0
05 Jun 2025
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
OffRL
LRM
491
42
0
18 May 2025
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Hector Pasten
Felipe Urrutia
Hector Jimenez
Cristian B. Calderon
Cristóbal Rojas
Chris Köcher
464
1
0
15 May 2025
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework
Yuan Xia
Akanksha Atrey
Fadoua Khmaissia
Kedar S. Namjoshi
LRM
ELM
324
2
0
28 Apr 2025
Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability
Chenhui Xu
Dancheng Liu
Jiajie Li
Amir Nassereldine
Zhaohui Li
Jinjun Xiong
LRM
363
0
0
05 Mar 2025
Ask, and it shall be given: On the Turing completeness of prompting
International Conference on Learning Representations (ICLR), 2024
Ruizhong Qiu
Zhe Xu
Wenxuan Bao
Hanghang Tong
ReLM
LRM
AI4CE
559
0
0
24 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
314
3
0
09 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
1.4K
24
0
04 Feb 2025
Learning Elementary Cellular Automata with Transformers
Mikhail Burtsev
467
2
0
02 Dec 2024
Training Neural Networks as Recognizers of Formal Languages
International Conference on Learning Representations (ICLR), 2024
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Robert Bamler
Brian DuSell
NAI
676
20
0
11 Nov 2024
Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans
Hanjun Dai
Francesco Zanini
277
16
0
04 Oct 2024
Transformers As Approximations of Solomonoff Induction
International Conference on Neural Information Processing (ICONIP), 2024
Nathan Young
Michael Witbrock
136
4
0
22 Aug 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
468
2
0
15 Jul 2024
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
Wenhui Zhu
Xiwen Chen
Peijie Qiu
Aristeidis Sotiras
Abolfazl Razi
Yalin Wang
309
37
0
04 Jul 2024
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
279
21
0
03 Jul 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Robert Bamler
ReLM
LRM
439
26
0
20 Jun 2024
[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs
Abhinav Rao
Monojit Choudhury
Somak Aditya
284
2
0
18 Jun 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
373
27
0
13 Jun 2024
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
Ancheng Xu
Minghuan Tan
Lei Wang
Min Yang
Ruifeng Xu
LRM
215
1
0
05 Jun 2024
Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning
Marco Sälzer
Eric Alsmann
Martin Lange
LRM
274
0
0
28 May 2024
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
465
9
0
27 May 2024
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRM
BDL
434
19
0
26 May 2024
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
560
7
0
24 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
323
30
0
07 May 2024
Do Large Language Models Learn Human-Like Strategic Preferences?
Jesse Roberts
Kyle Moore
Douglas H. Fisher
204
8
0
11 Apr 2024
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
Dylan Zhang
Curt Tigges
Zory Zhang
Stella Biderman
Maxim Raginsky
Talia Ringer
282
20
0
23 Jan 2024
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Neural Information Processing Systems (NeurIPS), 2023
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
324
28
0
03 Dec 2023
What Formal Languages Can Transformers Express? A Survey
Transactions of the Association for Computational Linguistics (TACL), 2023
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
549
112
0
01 Nov 2023
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
350
24
0
24 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
International Conference on Learning Representations (ICLR), 2023
Licong Lin
Yu Bai
Song Mei
OffRL
386
72
0
12 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
International Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
448
48
0
01 Oct 2023
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Shunjie Wang
Shane Steinert-Threlkeld
349
4
0
02 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Neural Information Processing Systems (NeurIPS), 2023
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
234
40
0
21 Jul 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
294
28
0
12 Jul 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
533
320
0
16 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Wanrong Zhu
321
30
0
04 Jun 2023
How Powerful are Decoder-Only Transformer Neural Models?
IEEE International Joint Conference on Neural Network (IJCNN), 2023
Jesse Roberts
BDL
255
31
0
26 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Neural Information Processing Systems (NeurIPS), 2023
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
594
112
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Neural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
745
402
0
24 May 2023
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ta-Chung Chi
Ting-Han Fan
Li-Wei Chen
Alexander I. Rudnicky
Peter J. Ramadge
VLM
MILM
207
23
0
23 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Neural Information Processing Systems (NeurIPS), 2023
Shuai Li
Zhao Song
Yu Xia
Tong Yu
Wanrong Zhu
247
50
0
26 Apr 2023
1
2
Next
Page 1 of 2