v1v2v3 (latest)

On the Computational Power of Transformers and its Implications in Sequence Modeling

16 June 2020

Papers citing "On the Computational Power of Transformers and its Implications in Sequence Modeling"

50 / 67 papers shown

Exact Learning of Arithmetic with Differentiable Agents

Hristo Papazov

Francesco DÁngelo

Nicolas Flammarion

122

27 Nov 2025

Softmax Transformers are Turing-Complete

212

25 Nov 2025

RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems

121

10 Oct 2025

The Role of Logic and Automata in Understanding Transformers

Anthony Widjaja Lin

Pablo Barcelo

AI4CE

140

28 Sep 2025

Efficient Turing Machine Simulation with Transformers

Qian Li

Yuyi Wang

LRM

186

28 Sep 2025

Is In-Context Learning Learning?

Adrian de Wynter

265

12 Sep 2025

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

...

282

22 Aug 2025

Sequential-Parallel Duality in Prefix Scannable Models

481

12 Jun 2025

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

Asankhaya Sharma

198

09 Jun 2025

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

409

05 Jun 2025

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

495

18 May 2025

Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models

469

15 May 2025

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

324

28 Apr 2025

Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability

366

05 Mar 2025

Ask, and it shall be given: On the Turing completeness of promptingInternational Conference on Learning Representations (ICLR), 2024

565

24 Feb 2025

Transformers versus the EM Algorithm in Multi-class Clustering

323

09 Feb 2025

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

1.4K

04 Feb 2025

Learning Elementary Cellular Automata with Transformers

Mikhail Burtsev

467

02 Dec 2024

Training Neural Networks as Recognizers of Formal LanguagesInternational Conference on Learning Representations (ICLR), 2024

687

11 Nov 2024

Autoregressive Large Language Models are Computationally Universal

Dale Schuurmans

Hanjun Dai

Francesco Zanini

279

04 Oct 2024

Transformers As Approximations of Solomonoff InductionInternational Conference on Neural Information Processing (ICONIP), 2024

Nathan Young

Michael Witbrock

138

22 Aug 2024

Representing Rule-based Chatbots with Transformers

Dan Friedman

Abhishek Panigrahi

Danqi Chen

476

15 Jul 2024

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Abolfazl Razi

324

04 Jul 2024

Universal Length Generalization with Turing Programs

279

03 Jul 2024

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

449

20 Jun 2024

[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs

Abhinav Rao

Monojit Choudhury

Somak Aditya

287

18 Jun 2024

Separations in the Representational Capabilities of Transformers and Recurrent Architectures

381

13 Jun 2024

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Ancheng Xu

Minghuan Tan

Lei Wang

Min Yang

Ruifeng Xu

LRM

215

05 Jun 2024

Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning

281

28 May 2024

Rethinking Transformers in Solving POMDPs

477

27 May 2024

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

446

26 May 2024

Models That Prove Their Own Correctness

578

24 May 2024

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

330

07 May 2024

Do Large Language Models Learn Human-Like Strategic Preferences?

Jesse Roberts

Kyle Moore

Douglas H. Fisher

209

11 Apr 2024

Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

285

23 Jan 2024

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammarsNeural Information Processing Systems (NeurIPS), 2023

Kaiyue Wen

Yuchen Li

Bing Liu

Andrej Risteski

328

03 Dec 2023

What Formal Languages Can Transformers Express? A SurveyTransactions of the Association for Computational Linguistics (TACL), 2023

558

113

01 Nov 2023

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential ExtensionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Kazuki Irie

Róbert Csordás

Jürgen Schmidhuber

355

24 Oct 2023

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingInternational Conference on Learning Representations (ICLR), 2023

391

12 Oct 2023

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023

Yuandong Tian

461

01 Oct 2023

Evaluating Transformer's Ability to Learn Mildly Context-Sensitive LanguagesBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Shunjie Wang

Shane Steinert-Threlkeld

350

02 Sep 2023

What can a Single Attention Layer Learn? A Study Through the Random Features LensNeural Information Processing Systems (NeurIPS), 2023

237

21 Jul 2023

Transformers in Reinforcement Learning: A Survey

Samira Ebrahimi Kahou

OffRL

308

12 Jul 2023

Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023

Ruiqi Zhang

Spencer Frei

Peter L. Bartlett

543

329

16 Jun 2023

A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models

Ritwik Sinha

Zhao Song

Wanrong Zhu

327

04 Jun 2023

How Powerful are Decoder-Only Transformer Neural Models?IEEE International Joint Conference on Neural Network (IJCNN), 2023

Jesse Roberts

BDL

262

26 May 2023

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer TransformerNeural Information Processing Systems (NeurIPS), 2023

602

112

25 May 2023

Towards Revealing the Mystery behind Chain of Thought: A Theoretical PerspectiveNeural Information Processing Systems (NeurIPS), 2023

771

411

24 May 2023

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Ta-Chung Chi

Ting-Han Fan

Li-Wei Chen

Alexander I. Rudnicky

Peter J. Ramadge

VLM MILM

216

23 May 2023

The Closeness of In-Context Learning and Weight Shifting for Softmax RegressionNeural Information Processing Systems (NeurIPS), 2023

Shuai Li

256

26 Apr 2023