ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.12172
  4. Cited By
Overcoming a Theoretical Limitation of Self-Attention

Overcoming a Theoretical Limitation of Self-Attention

24 February 2022
David Chiang
Peter A. Cholak
ArXivPDFHTML

Papers citing "Overcoming a Theoretical Limitation of Self-Attention"

50 / 68 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
20
0
0
13 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
46
0
0
02 May 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
87
0
0
24 Apr 2025
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
38
0
0
21 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
29
0
0
16 Apr 2025
Unique Hard Attention: A Tale of Two Sides
Unique Hard Attention: A Tale of Two Sides
Selim Jerad
Anej Svete
Jiaoda Li
Ryan Cotterell
54
0
0
18 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation
Yixiong Fang
Tianran Sun
Yuling Shi
Xiaodong Gu
50
0
0
13 Mar 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
86
3
0
24 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
88
0
0
12 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
134
0
0
04 Feb 2025
A completely uniform transformer for parity
A completely uniform transformer for parity
A. Kozachinskiy
Tomasz Steifer
33
0
0
07 Jan 2025
Theoretical limitations of multi-layer Transformer
Theoretical limitations of multi-layer Transformer
Lijie Chen
Binghui Peng
Hongxun Wu
AI4CE
67
6
0
04 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
69
0
0
28 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
36
2
0
11 Nov 2024
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters
  for Efficient LLM Inference
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
Junqi Zhao
Zhijin Fang
Shu Li
Shaohui Yang
Shichao He
27
2
0
30 Oct 2024
Counting Ability of Large Language Models and Impact of Tokenization
Counting Ability of Large Language Models and Impact of Tokenization
Xiang Zhang
Juntai Cao
Chenyu You
LRM
35
5
0
25 Oct 2024
Extracting Finite State Machines from Transformers
Extracting Finite State Machines from Transformers
Rik Adriaensen
Jaron Maene
AI4CE
24
0
0
08 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
18
1
0
05 Oct 2024
ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question
  Answering
ALR2^22: A Retrieve-then-Reason Framework for Long-context Question Answering
Huayang Li
Pat Verga
Priyanka Sen
Bowen Yang
Vijay Viswanathan
Patrick Lewis
Taro Watanabe
Yixuan Su
RALM
LRM
40
7
0
04 Oct 2024
softmax is not enough (for sharp out-of-distribution)
softmax is not enough (for sharp out-of-distribution)
Petar Veličković
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
37
17
0
01 Oct 2024
Improvements to SDXL in NovelAI Diffusion V3
Improvements to SDXL in NovelAI Diffusion V3
Juan Ossa
Eren Doğan
Alex Birch
F. Johnson
32
1
0
24 Sep 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
28
0
22 Jul 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
48
2
0
21 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
44
12
0
20 Jun 2024
Language Models Need Inductive Biases to Count Inductively
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
The Expressive Capacity of State Space Models: A Formal Language
  Perspective
The Expressive Capacity of State Space Models: A Formal Language Perspective
Yash Sarrof
Yana Veitsman
Michael Hahn
Mamba
30
7
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
19
2
0
27 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for
  Remote Sensing Image Interpretation
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
29
5
0
16 May 2024
Transformers Can Represent $n$-gram Language Models
Transformers Can Represent nnn-gram Language Models
Anej Svete
Ryan Cotterell
32
17
0
23 Apr 2024
Length Generalization of Causal Transformers without Position Encoding
Length Generalization of Causal Transformers without Position Encoding
Jie Wang
Tao Ji
Yuanbin Wu
Hang Yan
Tao Gui
Qi Zhang
Xuanjing Huang
Xiaoling Wang
VLM
42
15
0
18 Apr 2024
LongEmbed: Extending Embedding Models for Long Context Retrieval
LongEmbed: Extending Embedding Models for Long Context Retrieval
Dawei Zhu
Liang Wang
Nan Yang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
RALM
40
21
0
18 Apr 2024
TEL'M: Test and Evaluation of Language Models
TEL'M: Test and Evaluation of Language Models
G. Cybenko
Joshua Ackerman
Paul Lintilhac
ALM
ELM
32
0
0
16 Apr 2024
MemFlow: Optical Flow Estimation and Prediction with Memory
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong
Yanwei Fu
23
19
0
07 Apr 2024
Transformers as Transducers
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
27
4
0
02 Apr 2024
Simulating Weighted Automata over Sequences and Trees with Transformers
Simulating Weighted Automata over Sequences and Trees with Transformers
Michael Rizvi
M. Lizaire
Clara Lacroce
Guillaume Rabusseau
AI4CE
45
0
0
12 Mar 2024
Why are Sensitive Functions Hard for Transformers?
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
22
23
0
15 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
34
12
0
30 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
19
25
0
22 Jan 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
63
211
0
22 Jan 2024
Extending LLMs' Context Window with 100 Samples
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
24
11
0
13 Jan 2024
Modality-Collaborative Transformer with Hybrid Feature Reconstruction
  for Robust Emotion Recognition
Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Chengxin Chen
Pengyuan Zhang
26
5
0
26 Dec 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
13
74
0
28 Nov 2023
Addressing the Length Bias Problem in Document-Level Neural Machine
  Translation
Addressing the Length Bias Problem in Document-Level Neural Machine Translation
Zhuocheng Zhang
Shuhao Gu
Min Zhang
Yang Feng
15
0
0
20 Nov 2023
The Transient Nature of Emergent In-Context Learning in Transformers
The Transient Nature of Emergent In-Context Learning in Transformers
Aaditya K. Singh
Stephanie C. Y. Chan
Ted Moskovitz
Erin Grant
Andrew M. Saxe
Felix Hill
62
31
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
31
714
0
09 Nov 2023
What Formal Languages Can Transformers Express? A Survey
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
14
46
0
01 Nov 2023
Pushdown Layers: Encoding Recursive Structure in Transformer Language
  Models
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
AI4CE
33
13
0
29 Oct 2023
Unraveling Feature Extraction Mechanisms in Neural Networks
Unraveling Feature Extraction Mechanisms in Neural Networks
Xiaobing Sun
Jiaxi Li
Wei Lu
13
0
0
25 Oct 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
25
109
0
24 Oct 2023
12
Next