Inductive Biases and Variable Creation in Self-Attention Mechanisms

19 October 2021

Papers citing "Inductive Biases and Variable Creation in Self-Attention Mechanisms"

44 / 94 papers shown

Title
Large Language Models Michael R Douglas LLMAG LM&MA 27 555 0 11 Jul 2023
Bidirectional Attention as a Mixture of Continuous Word Experts Kevin Christian Wibisono Yixin Wang MoE 8 0 0 08 Jul 2023
Trainable Transformer in Transformer A. Panigrahi Sadhika Malladi Mengzhou Xia Sanjeev Arora VLM 22 12 0 03 Jul 2023
H $_2$ O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Zhenyu (Allen) Zhang Ying Sheng Tianyi Zhou Tianlong Chen Lianmin Zheng ... Yuandong Tian Christopher Ré Clark W. Barrett Zhangyang Wang Beidi Chen VLM 47 248 0 24 Jun 2023
Large Sequence Models for Sequential Decision-Making: A Survey Muning Wen Runji Lin Hanjing Wang Yaodong Yang Ying Wen Luo Mai J. Wang Haifeng Zhang Weinan Zhang LM&Ro LRM 29 34 0 24 Jun 2023
Max-Margin Token Selection in Attention Mechanism Davoud Ataee Tarzanagh Yingcong Li Xuechen Zhang Samet Oymak 17 38 0 23 Jun 2023
Trained Transformers Learn Linear Models In-Context Ruiqi Zhang Spencer Frei Peter L. Bartlett 14 173 0 16 Jun 2023
Ensembled Prediction Intervals for Causal Outcomes Under Hidden Confounding Myrl G. Marmarelis Greg Ver Steeg Aram Galstyan Fred Morstatter CML OOD 16 5 0 15 Jun 2023
FLSL: Feature-level Self-supervised Learning Qing Su Anton Netchaev Hai Helen Li Shihao Ji 22 4 0 09 Jun 2023
On the Role of Attention in Prompt-tuning Samet Oymak A. S. Rawat Mahdi Soltanolkotabi Christos Thrampoulidis MLT LRM 15 41 0 06 Jun 2023
Representational Strengths and Limitations of Transformers Clayton Sanford Daniel J. Hsu Matus Telgarsky 9 81 0 05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models Ritwik Sinha Zhao-quan Song Tianyi Zhou 19 23 0 04 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers Sadegh Mahdavi Renjie Liao Christos Thrampoulidis 16 22 0 03 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang LRM 27 46 0 01 Jun 2023
Birth of a Transformer: A Memory Viewpoint A. Bietti Vivien A. Cabannes Diane Bouchacourt Hervé Jégou Léon Bottou 16 80 0 01 Jun 2023
Transformers learn to implement preconditioned gradient descent for in-context learning Kwangjun Ahn Xiang Cheng Hadi Daneshmand S. Sra ODL 17 147 0 01 Jun 2023
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization Yufeng Zhang Fengzhuo Zhang Zhuoran Yang Zhaoran Wang BDL 36 62 0 30 May 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input Shokichi Takakura Taiji Suzuki 12 17 0 30 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Yuandong Tian Yiping Wang Beidi Chen S. Du MLT 18 70 0 25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective Guhao Feng Bohang Zhang Yuntian Gu Haotian Ye Di He Liwei Wang LRM 27 214 0 24 May 2023
Learning to Extrapolate: A Transductive Approach Aviv Netanyahu Abhishek Gupta Max Simchowitz K. Zhang Pulkit Agrawal 35 15 0 27 Apr 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression Shuai Li Zhao-quan Song Yu Xia Tong Yu Tianyi Zhou 28 36 0 26 Apr 2023
An Over-parameterized Exponential Regression Yeqi Gao Sridhar Mahadevan Zhao-quan Song 16 35 0 29 Mar 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems Zhihang Li Zhao-quan Song Tianyi Zhou 9 39 0 28 Mar 2023
Do Transformers Parse while Predicting the Masked Word? Haoyu Zhao A. Panigrahi Rong Ge Sanjeev Arora 74 31 0 14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 107 61 0 07 Mar 2023
Efficiency 360: Efficient Vision Transformers Badri N. Patro Vijay Srinivas Agneeswaran 19 6 0 16 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity Hongkang Li M. Wang Sijia Liu Pin-Yu Chen ViT MLT 29 56 0 12 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models Yufeng Zhang Boyi Liu Qi Cai Lingxiao Wang Zhaoran Wang 31 11 0 30 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models Daniel Y. Fu Tri Dao Khaled Kamal Saab A. Thomas Atri Rudra Christopher Ré 43 367 0 28 Dec 2022
Generalizing Multimodal Variational Methods to Sets Jinzhao Zhou Yiqun Duan Zhihong Chen Yu-Cheng Chang Chin-Teng Lin DRL 42 0 0 19 Dec 2022
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions S. Bhattamishra Arkil Patel Varun Kanade Phil Blunsom 14 43 0 22 Nov 2022
Transformers Learn Shortcuts to Automata Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang OffRL LRM 19 155 0 19 Oct 2022
Vision Transformers provably learn spatial structure Samy Jelassi Michael E. Sander Yuan-Fang Li ViT MLT 11 73 0 13 Oct 2022
Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries Chao Ma Lexing Ying 11 2 0 13 Oct 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 456 0 24 Sep 2022
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL Fengzhuo Zhang Boyi Liu Kaixin Wang Vincent Y. F. Tan Zhuoran Yang Zhaoran Wang OffRL LRM 49 10 0 20 Sep 2022
Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms Surbhi Goel Sham Kakade Adam Tauman Kalai Cyril Zhang 14 1 0 01 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes Shivam Garg Dimitris Tsipras Percy Liang Gregory Valiant 19 447 0 01 Aug 2022
Formal Algorithms for Transformers Mary Phuong Marcus Hutter 19 68 0 19 Jul 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit Boaz Barak Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 25 123 0 18 Jul 2022
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 239 2,592 0 04 May 2021
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 214 7,687 0 17 Aug 2015
Norm-Based Capacity Control in Neural Networks Behnam Neyshabur Ryota Tomioka Nathan Srebro 114 577 0 27 Feb 2015