Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.10090
Cited By
Inductive Biases and Variable Creation in Self-Attention Mechanisms
19 October 2021
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inductive Biases and Variable Creation in Self-Attention Mechanisms"
17 / 17 papers shown
Title
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Zhaiming Shen
Alex Havrilla
Rongjie Lai
A. Cloninger
Wenjing Liao
32
0
0
06 May 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
44
9
0
03 Jan 2025
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
36
2
0
11 Nov 2024
Generalizable autoregressive modeling of time series through functional narratives
Ran Liu
Wenrui Ma
Ellen L. Zippi
Hadi Pouransari
Jingyun Xiao
...
Behrooz Mahasseni
Juri Minxha
Erdrin Azemi
Eva L. Dyer
Ali Moin
AI4TS
25
0
0
10 Oct 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
59
1
0
15 Jul 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
5
0
24 May 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
31
25
0
04 Apr 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
15
18
0
28 Jan 2024
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
43
367
0
28 Dec 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
19
446
0
01 Aug 2022
Formal Algorithms for Transformers
Mary Phuong
Marcus Hutter
19
68
0
19 Jul 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
25
122
0
18 Jul 2022
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
214
7,687
0
17 Aug 2015
Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
111
577
0
27 Feb 2015
1