Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.10090
Cited By
Inductive Biases and Variable Creation in Self-Attention Mechanisms
19 October 2021
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inductive Biases and Variable Creation in Self-Attention Mechanisms"
50 / 94 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
20
0
0
13 May 2025
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Zhaiming Shen
Alex Havrilla
Rongjie Lai
A. Cloninger
Wenjing Liao
37
0
0
06 May 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
34
0
0
18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
29
0
0
16 Apr 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
43
0
0
14 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
96
2
0
04 Mar 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRL
LRM
AI4CE
50
9
0
24 Feb 2025
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Arya Honarpisheh
Mustafa Bozdag
M. Sznaier
Octavia Camps
Mamba
67
0
0
03 Feb 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
44
9
0
03 Jan 2025
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
Alex Havrilla
Wenjing Liao
26
8
0
11 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
36
2
0
11 Nov 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric
Baiyuan Chen
MLT
18
0
0
23 Oct 2024
On Rank-Dependent Generalisation Error Bounds for Transformers
Lan V. Truong
32
2
0
15 Oct 2024
Learning Linear Attention in Polynomial Time
Morris Yau
Ekin Akyürek
Jiayuan Mao
Joshua B. Tenenbaum
Stefanie Jegelka
Jacob Andreas
17
2
0
14 Oct 2024
Generalizable autoregressive modeling of time series through functional narratives
Ran Liu
Wenrui Ma
Ellen L. Zippi
Hadi Pouransari
Jingyun Xiao
...
Behrooz Mahasseni
Juri Minxha
Erdrin Azemi
Eva L. Dyer
Ali Moin
AI4TS
25
0
0
10 Oct 2024
Large Language Models as Markov Chains
Oussama Zekri
Ambroise Odonnat
Abdelhakim Benechehab
Linus Bleistein
Nicolas Boullé
I. Redko
34
9
0
03 Oct 2024
On the Inductive Bias of Stacking Towards Improving Reasoning
Nikunj Saunshi
Stefani Karp
Shankar Krishnan
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
LRM
AI4CE
29
4
0
27 Sep 2024
Non-asymptotic Convergence of Training Transformers for Next-token Prediction
Ruiquan Huang
Yingbin Liang
Jing Yang
21
5
0
25 Sep 2024
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
Jiaheng Hu
Rose Hendrix
Ali Farhadi
Aniruddha Kembhavi
Roberto Martin-Martin
Peter Stone
Kuo-Hao Zeng
Kiana Ehsani
31
7
0
25 Sep 2024
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
Tong Yang
Yu Huang
Yingbin Liang
Yuejie Chi
MLT
27
5
0
19 Aug 2024
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
Yuchen Li
Alexandre Kirchmeyer
Aashay Mehta
Yilong Qin
Boris Dadachev
Kishore Papineni
Sanjiv Kumar
Andrej Risteski
38
0
0
22 Jul 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
59
1
0
15 Jul 2024
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors
Tam Thuc Do
Parham Eftekhar
Seyed Alireza Hosseini
Gene Cheung
Philip A. Chou
21
0
0
06 Jun 2024
Length independent generalization bounds for deep SSM architectures
Dániel Rácz
M. Petreczky
Bálint Daróczy
34
1
0
30 May 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
6
0
24 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
47
13
0
07 May 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
31
25
0
04 Apr 2024
Mechanics of Next Token Prediction with Self-Attention
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
16
25
0
12 Mar 2024
On the Generalization Ability of Unsupervised Pretraining
Yuyang Deng
Junyuan Hong
Jiayu Zhou
M. Mahdavi
SSL
35
4
0
11 Mar 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
Zhenyu (Allen) Zhang
Runjin Chen
Shiwei Liu
Zhewei Yao
Olatunji Ruwase
Beidi Chen
Xiaoxia Wu
Zhangyang Wang
26
26
0
05 Mar 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
M. E. Ildiz
Yixiao Huang
Yingcong Li
A. S. Rawat
Samet Oymak
16
17
0
21 Feb 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
20
95
0
20 Feb 2024
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
20
23
0
15 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
14
11
0
06 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Gianluigi Lopardo
F. Precioso
Damien Garreau
6
4
0
05 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
26
3
0
05 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
95
78
0
01 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
15
18
0
28 Jan 2024
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
16
21
0
03 Dec 2023
On the Convergence of Encoder-only Shallow Transformers
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
34
5
0
02 Nov 2023
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
Jacob Trauger
Ambuj Tewari
24
11
0
19 Oct 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
34
33
0
19 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
23
4
0
17 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
16
35
0
01 Oct 2023
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
LRM
14
36
0
13 Sep 2023
Breaking through the learning plateaus of in-context learning in Transformer
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
30
0
0
12 Sep 2023
Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis
LI DU
Yequan Wang
Xingrun Xing
Yiqun Ya
Xiang Li
Xin Jiang
Xuezhi Fang
HILM
15
13
0
11 Sep 2023
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
25
43
0
31 Aug 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
13
22
0
21 Jul 2023
1
2
Next