Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.12172
Cited By
Overcoming a Theoretical Limitation of Self-Attention
24 February 2022
David Chiang
Peter A. Cholak
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Overcoming a Theoretical Limitation of Self-Attention"
50 / 68 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
20
0
0
13 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
46
0
0
02 May 2025
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
87
0
0
24 Apr 2025
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
38
0
0
21 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
29
0
0
16 Apr 2025
Unique Hard Attention: A Tale of Two Sides
Selim Jerad
Anej Svete
Jiaoda Li
Ryan Cotterell
54
0
0
18 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation
Yixiong Fang
Tianran Sun
Yuling Shi
Xiaodong Gu
50
0
0
13 Mar 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
86
3
0
24 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
88
0
0
12 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
134
0
0
04 Feb 2025
A completely uniform transformer for parity
A. Kozachinskiy
Tomasz Steifer
33
0
0
07 Jan 2025
Theoretical limitations of multi-layer Transformer
Lijie Chen
Binghui Peng
Hongxun Wu
AI4CE
67
6
0
04 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
69
0
0
28 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
36
2
0
11 Nov 2024
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
Junqi Zhao
Zhijin Fang
Shu Li
Shaohui Yang
Shichao He
27
2
0
30 Oct 2024
Counting Ability of Large Language Models and Impact of Tokenization
Xiang Zhang
Juntai Cao
Chenyu You
LRM
35
5
0
25 Oct 2024
Extracting Finite State Machines from Transformers
Rik Adriaensen
Jaron Maene
AI4CE
24
0
0
08 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
18
1
0
05 Oct 2024
ALR
2
^2
2
: A Retrieve-then-Reason Framework for Long-context Question Answering
Huayang Li
Pat Verga
Priyanka Sen
Bowen Yang
Vijay Viswanathan
Patrick Lewis
Taro Watanabe
Yixuan Su
RALM
LRM
40
7
0
04 Oct 2024
softmax is not enough (for sharp out-of-distribution)
Petar Veličković
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
37
17
0
01 Oct 2024
Improvements to SDXL in NovelAI Diffusion V3
Juan Ossa
Eren Doğan
Alex Birch
F. Johnson
32
1
0
24 Sep 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
28
0
22 Jul 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
48
2
0
21 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
44
12
0
20 Jun 2024
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
The Expressive Capacity of State Space Models: A Formal Language Perspective
Yash Sarrof
Yana Veitsman
Michael Hahn
Mamba
30
7
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
19
2
0
27 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
29
5
0
16 May 2024
Transformers Can Represent
n
n
n
-gram Language Models
Anej Svete
Ryan Cotterell
32
17
0
23 Apr 2024
Length Generalization of Causal Transformers without Position Encoding
Jie Wang
Tao Ji
Yuanbin Wu
Hang Yan
Tao Gui
Qi Zhang
Xuanjing Huang
Xiaoling Wang
VLM
42
15
0
18 Apr 2024
LongEmbed: Extending Embedding Models for Long Context Retrieval
Dawei Zhu
Liang Wang
Nan Yang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
RALM
40
21
0
18 Apr 2024
TEL'M: Test and Evaluation of Language Models
G. Cybenko
Joshua Ackerman
Paul Lintilhac
ALM
ELM
32
0
0
16 Apr 2024
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong
Yanwei Fu
23
19
0
07 Apr 2024
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
27
4
0
02 Apr 2024
Simulating Weighted Automata over Sequences and Trees with Transformers
Michael Rizvi
M. Lizaire
Clara Lacroce
Guillaume Rabusseau
AI4CE
45
0
0
12 Mar 2024
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
22
23
0
15 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
34
12
0
30 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
19
25
0
22 Jan 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
63
211
0
22 Jan 2024
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
24
11
0
13 Jan 2024
Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Chengxin Chen
Pengyuan Zhang
26
5
0
26 Dec 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
13
74
0
28 Nov 2023
Addressing the Length Bias Problem in Document-Level Neural Machine Translation
Zhuocheng Zhang
Shuhao Gu
Min Zhang
Yang Feng
15
0
0
20 Nov 2023
The Transient Nature of Emergent In-Context Learning in Transformers
Aaditya K. Singh
Stephanie C. Y. Chan
Ted Moskovitz
Erin Grant
Andrew M. Saxe
Felix Hill
62
31
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
31
714
0
09 Nov 2023
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
14
46
0
01 Nov 2023
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
AI4CE
33
13
0
29 Oct 2023
Unraveling Feature Extraction Mechanisms in Neural Networks
Xiaobing Sun
Jiaxi Li
Wei Lu
13
0
0
25 Oct 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
25
109
0
24 Oct 2023
1
2
Next