On Limitations of the Transformer Architecture

13 February 2024

Papers citing "On Limitations of the Transformer Architecture"

28 / 28 papers shown

Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally Tobias Schnabel Kiran Tomlinson Adith Swaminathan Jennifer Neville LRM 20 0 0 13 May 2025
Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents Schaun Wheeler Olivier Jeunen LLMAG 36 0 0 06 May 2025
Concise One-Layer Transformers Can Do Function Evaluation (Sometimes) Lena Strobl Dana Angluin Robert Frank 38 0 0 28 Mar 2025
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? Aabid Karim Abdul Karim Bhoomika Lohana Matt Keon Jaswinder Singh A. Sattar 47 0 0 23 Mar 2025
The Role of Sparsity for Length Generalization in Transformers Noah Golowich Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 37 0 0 24 Feb 2025
Provably Overwhelming Transformer Models with Designed Inputs Lev Stambler Seyed Sajjad Nezhadi Matthew Coudron 74 0 0 09 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Alireza Amiri Xinting Huang Mark Rofin Michael Hahn LRM 119 0 0 04 Feb 2025
Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method A. Kozachinskiy Felipe Urrutia Hector Jimenez Tomasz Steifer Germán Pizarro Matías Fuentes Francisco Meza Cristian Buc Cristóbal Rojas 47 1 0 31 Jan 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? Yutong Yin Zhaoran Wang LRM ReLM 71 0 0 27 Jan 2025
Ehrenfeucht-Haussler Rank and Chain of Thought Pablo Barceló A. Kozachinskiy Tomasz Steifer LRM 71 1 0 22 Jan 2025
A completely uniform transformer for parity A. Kozachinskiy Tomasz Steifer 33 0 0 07 Jan 2025
Lower bounds on transformers with infinite precision Alexander Kozachinskiy 29 2 0 31 Dec 2024
Theoretical limitations of multi-layer Transformer Lijie Chen Binghui Peng Hongxun Wu AI4CE 67 6 0 04 Dec 2024
The Asymptotic Behavior of Attention in Transformers Álvaro Rodríguez Abella João Pedro Silvestre Paulo Tabuada 61 3 0 03 Dec 2024
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? Sohee Yang Nora Kassner E. Gribovskaya Sebastian Riedel Mor Geva KELM LRM ReLM 78 4 0 25 Nov 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing Shreya Shankar Tristan Chambers Eugene Wu Aditya G. Parameswaran Eugene Wu LLMAG 53 6 0 16 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Iman Mirzadeh Keivan Alizadeh Hooman Shahrokhi Oncel Tuzel Samy Bengio Mehrdad Farajtabar AIMat LRM 58 127 0 07 Oct 2024
Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis Luka Andrenšek Boshko Koloski Andraz Pelicon Nada Lavrac Senja Pollak Matthew Purver 21 1 0 30 Sep 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective Yotam Wolf Binyamin Rothberg Dorin Shteyman Amnon Shashua 18 0 0 26 Sep 2024
One-layer transformers fail to solve the induction heads task Clayton Sanford Daniel J. Hsu Matus Telgarsky 16 8 0 26 Aug 2024
Can Large Language Models Reason? A Characterization via 3-SAT Rishi Hazra Gabriele Venturato Pedro Zuidberg Dos Martires Luc de Raedt ELM ReLM LRM 30 4 0 13 Aug 2024
When Can Transformers Count to n? Gilad Yehudai Haim Kaplan Asma Ghandeharioun Mor Geva Amir Globerson 32 10 0 21 Jul 2024
On the Design and Analysis of LLM-Based Algorithms Yanxi Chen Yaliang Li Bolin Ding Jingren Zhou 41 4 0 20 Jul 2024
Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model Doyoung Kim Jongwon Lee Jinho Park Minjoon Seo LM&Ro 36 0 0 21 Jun 2024
The Expressive Capacity of State Space Models: A Formal Language Perspective Yash Sarrof Yana Veitsman Michael Hahn Mamba 30 7 0 27 May 2024
Large Language Models for UAVs: Current State and Pathways to the Future Shumaila Javaid Nasir Saeed Bin He 32 16 0 02 May 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Badri N. Patro Vijay Srinivas Agneeswaran Mamba 30 38 0 24 Apr 2024
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,402 0 28 Jan 2022