ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.06755
  4. Cited By
Theoretical Limitations of Self-Attention in Neural Sequence Models

Theoretical Limitations of Self-Attention in Neural Sequence Models

16 June 2019
Michael Hahn
ArXivPDFHTML

Papers citing "Theoretical Limitations of Self-Attention in Neural Sequence Models"

45 / 45 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
46
0
0
02 May 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
89
0
0
24 Apr 2025
What makes a good feedforward computational graph?
What makes a good feedforward computational graph?
Alex Vitvitskyi
J. G. Araújo
Marc Lackenby
Petar Velickovic
80
1
0
10 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
163
0
0
04 Feb 2025
Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method
Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method
A. Kozachinskiy
Felipe Urrutia
Hector Jimenez
Tomasz Steifer
Germán Pizarro
Matías Fuentes
Francisco Meza
Cristian Buc
Cristóbal Rojas
47
1
0
31 Jan 2025
Lower bounds on transformers with infinite precision
Lower bounds on transformers with infinite precision
Alexander Kozachinskiy
34
2
0
31 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
74
0
0
28 Nov 2024
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Riccardo Grazzi
Julien N. Siems
Jörg K.H. Franke
Arber Zela
Frank Hutter
Massimiliano Pontil
92
11
0
19 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
36
2
0
11 Nov 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
61
1
0
15 Jul 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
51
2
0
21 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
46
12
0
20 Jun 2024
Separations in the Representational Capabilities of Transformers and
  Recurrent Architectures
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
36
9
0
13 Jun 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Ryan Cotterell
42
11
0
06 Jun 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics
  Theory of Transformers
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
6
0
24 May 2024
Transformers as Transducers
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
27
4
0
02 Apr 2024
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Tian Xu
Yun Fu
36
11
0
31 Mar 2024
Topology-Informed Graph Transformer
Topology-Informed Graph Transformer
Yuncheol Choi
Sun Woo Park
Minho Lee
Youngho Woo
21
3
0
03 Feb 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
63
211
0
22 Jan 2024
On The Expressivity of Recurrent Neural Cascades
On The Expressivity of Recurrent Neural Cascades
Nadezda A. Knorozova
Alessandro Ronca
18
1
0
14 Dec 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
8
0
07 Sep 2023
Using Artificial Populations to Study Psychological Phenomena in Neural
  Models
Using Artificial Populations to Study Psychological Phenomena in Neural Models
Jesse Roberts
Kyle Moore
Drew Wilenzick
Doug Fisher
17
6
0
15 Aug 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
30
328
0
29 May 2023
TAPIR: Learning Adaptive Revision for Incremental Natural Language
  Understanding with a Two-Pass Model
TAPIR: Learning Adaptive Revision for Incremental Natural Language Understanding with a Two-Pass Model
Patrick Kahardipraja
Brielen Madureira
David Schlangen
CLL
26
9
0
18 May 2023
Models of symbol emergence in communication: a conceptual review and a
  guide for avoiding local minima
Models of symbol emergence in communication: a conceptual review and a guide for avoiding local minima
Julian Zubek
Tomasz Korbak
J. Rączaszek-Leonardi
28
2
0
08 Mar 2023
Adaptive Computation with Elastic Input Sequence
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
29
18
0
30 Jan 2023
Tighter Bounds on the Expressivity of Transformer Encoders
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
27
53
0
25 Jan 2023
Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs
Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs
Anej Svete
Benjamin Dayan
Tim Vieira
Ryan Cotterell
Jason Eisner
22
1
0
17 Jan 2023
Structural generalization is hard for sequence-to-sequence models
Structural generalization is hard for sequence-to-sequence models
Yuekun Yao
Alexander Koller
22
21
0
24 Oct 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
32
155
0
19 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
21
447
0
01 Aug 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
25
123
0
18 Jul 2022
Impartial Games: A Challenge for Reinforcement Learning
Impartial Games: A Challenge for Reinforcement Learning
Bei Zhou
Søren Riis
24
6
0
25 May 2022
Formal Language Recognition by Hard Attention Transformers: Perspectives
  from Circuit Complexity
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao
Dana Angluin
Robert Frank
11
70
0
13 Apr 2022
Overcoming a Theoretical Limitation of Self-Attention
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
25
76
0
24 Feb 2022
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
115
0
19 Oct 2021
Screen Parsing: Towards Reverse Engineering of UI Models from
  Screenshots
Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots
Jason Wu
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
3DV
163
71
0
17 Sep 2021
What Context Features Can Transformer Language Models Use?
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
21
75
0
15 Jun 2021
Thinking Like Transformers
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
24
126
0
13 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
26
58
0
11 Jun 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
14
81
0
28 Mar 2021
Formal Language Theory Meets Modern NLP
Formal Language Theory Meets Modern NLP
William Merrill
AI4CE
NAI
14
12
0
19 Feb 2021
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
A Decomposable Attention Model for Natural Language Inference
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
198
1,367
0
06 Jun 2016
1