Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.10090
Cited By
Inductive Biases and Variable Creation in Self-Attention Mechanisms
19 October 2021
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inductive Biases and Variable Creation in Self-Attention Mechanisms"
44 / 94 papers shown
Title
Large Language Models
Michael R Douglas
LLMAG
LM&MA
27
555
0
11 Jul 2023
Bidirectional Attention as a Mixture of Continuous Word Experts
Kevin Christian Wibisono
Yixin Wang
MoE
8
0
0
08 Jul 2023
Trainable Transformer in Transformer
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
22
12
0
03 Jul 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu (Allen) Zhang
Ying Sheng
Tianyi Zhou
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
47
248
0
24 Jun 2023
Large Sequence Models for Sequential Decision-Making: A Survey
Muning Wen
Runji Lin
Hanjing Wang
Yaodong Yang
Ying Wen
Luo Mai
J. Wang
Haifeng Zhang
Weinan Zhang
LM&Ro
LRM
29
34
0
24 Jun 2023
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
17
38
0
23 Jun 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
14
173
0
16 Jun 2023
Ensembled Prediction Intervals for Causal Outcomes Under Hidden Confounding
Myrl G. Marmarelis
Greg Ver Steeg
Aram Galstyan
Fred Morstatter
CML
OOD
16
5
0
15 Jun 2023
FLSL: Feature-level Self-supervised Learning
Qing Su
Anton Netchaev
Hai Helen Li
Shihao Ji
22
4
0
09 Jun 2023
On the Role of Attention in Prompt-tuning
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLT
LRM
15
41
0
06 Jun 2023
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
9
81
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao-quan Song
Tianyi Zhou
19
23
0
04 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
16
22
0
03 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
27
46
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
16
80
0
01 Jun 2023
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
17
147
0
01 Jun 2023
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization
Yufeng Zhang
Fengzhuo Zhang
Zhuoran Yang
Zhaoran Wang
BDL
36
62
0
30 May 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
Shokichi Takakura
Taiji Suzuki
12
17
0
30 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
18
70
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
27
214
0
24 May 2023
Learning to Extrapolate: A Transductive Approach
Aviv Netanyahu
Abhishek Gupta
Max Simchowitz
K. Zhang
Pulkit Agrawal
35
15
0
27 Apr 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao-quan Song
Yu Xia
Tong Yu
Tianyi Zhou
28
36
0
26 Apr 2023
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao-quan Song
16
35
0
29 Mar 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao-quan Song
Tianyi Zhou
9
39
0
28 Mar 2023
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
74
31
0
14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
107
61
0
07 Mar 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
19
6
0
16 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
29
56
0
12 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
31
11
0
30 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
43
367
0
28 Dec 2022
Generalizing Multimodal Variational Methods to Sets
Jinzhao Zhou
Yiqun Duan
Zhihong Chen
Yu-Cheng Chang
Chin-Teng Lin
DRL
42
0
0
19 Dec 2022
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
S. Bhattamishra
Arkil Patel
Varun Kanade
Phil Blunsom
14
43
0
22 Nov 2022
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
19
155
0
19 Oct 2022
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
11
73
0
13 Oct 2022
Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries
Chao Ma
Lexing Ying
11
2
0
13 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
456
0
24 Sep 2022
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
Fengzhuo Zhang
Boyi Liu
Kaixin Wang
Vincent Y. F. Tan
Zhuoran Yang
Zhaoran Wang
OffRL
LRM
49
10
0
20 Sep 2022
Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms
Surbhi Goel
Sham Kakade
Adam Tauman Kalai
Cyril Zhang
14
1
0
01 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
19
447
0
01 Aug 2022
Formal Algorithms for Transformers
Mary Phuong
Marcus Hutter
19
68
0
19 Jul 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
25
123
0
18 Jul 2022
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,592
0
04 May 2021
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
214
7,687
0
17 Aug 2015
Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
114
577
0
27 Feb 2015
Previous
1
2