Inductive Biases and Variable Creation in Self-Attention Mechanisms

19 October 2021

Papers citing "Inductive Biases and Variable Creation in Self-Attention Mechanisms"

50 / 94 papers shown

Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally Tobias Schnabel Kiran Tomlinson Adith Swaminathan Jennifer Neville LRM 20 0 0 13 May 2025
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights Zhaiming Shen Alex Havrilla Rongjie Lai A. Cloninger Wenjing Liao 37 0 0 06 May 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective Yuling Jiao Yanming Lai Yang Wang Bokai Yan 34 0 0 18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression Yuling Jiao Yanming Lai Defeng Sun Yang Wang Bokai Yan 29 0 0 16 Apr 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective Alireza Mousavi-Hosseini Clayton Sanford Denny Wu Murat A. Erdogdu 43 0 0 14 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Ailin Deng Tri Cao Zhirui Chen Bryan Hooi VLM 96 2 0 04 Mar 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers Nikunj Saunshi Nishanth Dikkala Zhiyuan Li Sanjiv Kumar Sashank J. Reddi OffRL LRM AI4CE 50 9 0 24 Feb 2025
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention Arya Honarpisheh Mustafa Bozdag M. Sznaier Octavia Camps Mamba 67 0 0 03 Feb 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling Hao Jiang Qianxiao Li 44 9 0 03 Jan 2025
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data Alex Havrilla Wenjing Liao 26 8 0 11 Nov 2024
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 36 2 0 11 Nov 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric Baiyuan Chen MLT 18 0 0 23 Oct 2024
On Rank-Dependent Generalisation Error Bounds for Transformers Lan V. Truong 32 2 0 15 Oct 2024
Learning Linear Attention in Polynomial Time Morris Yau Ekin Akyürek Jiayuan Mao Joshua B. Tenenbaum Stefanie Jegelka Jacob Andreas 17 2 0 14 Oct 2024
Generalizable autoregressive modeling of time series through functional narratives Ran Liu Wenrui Ma Ellen L. Zippi Hadi Pouransari Jingyun Xiao ... Behrooz Mahasseni Juri Minxha Erdrin Azemi Eva L. Dyer Ali Moin AI4TS 25 0 0 10 Oct 2024
Large Language Models as Markov Chains Oussama Zekri Ambroise Odonnat Abdelhakim Benechehab Linus Bleistein Nicolas Boullé I. Redko 34 9 0 03 Oct 2024
On the Inductive Bias of Stacking Towards Improving Reasoning Nikunj Saunshi Stefani Karp Shankar Krishnan Sobhan Miryoosefi Sashank J. Reddi Sanjiv Kumar LRM AI4CE 29 4 0 27 Sep 2024
Non-asymptotic Convergence of Training Transformers for Next-token Prediction Ruiquan Huang Yingbin Liang Jing Yang 21 5 0 25 Sep 2024
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning Jiaheng Hu Rose Hendrix Ali Farhadi Aniruddha Kembhavi Roberto Martin-Martin Peter Stone Kuo-Hao Zeng Kiana Ehsani 31 7 0 25 Sep 2024
In-Context Learning with Representations: Contextual Generalization of Trained Transformers Tong Yang Yu Huang Yingbin Liang Yuejie Chi MLT 27 5 0 19 Aug 2024
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines Yuchen Li Alexandre Kirchmeyer Aashay Mehta Yilong Qin Boris Dadachev Kishore Papineni Sanjiv Kumar Andrej Risteski 38 0 0 22 Jul 2024
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 59 1 0 15 Jul 2024
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors Tam Thuc Do Parham Eftekhar Seyed Alireza Hosseini Gene Cheung Philip A. Chou 21 0 0 06 Jun 2024
Length independent generalization bounds for deep SSM architectures Dániel Rácz M. Petreczky Bálint Daróczy 34 1 0 30 May 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers Lorenzo Tiberi Francesca Mignacco Kazuki Irie H. Sompolinsky 42 6 0 24 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics Hanlin Zhu Baihe Huang Shaolun Zhang Michael I. Jordan Jiantao Jiao Yuandong Tian Stuart Russell LRM AI4CE 47 13 0 07 May 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models Jerry Yao-Chieh Hu Pei-Hsuan Chang Haozheng Luo Hong-Yu Chen Weijian Li Wei-Po Wang Han Liu 31 25 0 04 Apr 2024
Mechanics of Next Token Prediction with Self-Attention Yingcong Li Yixiao Huang M. E. Ildiz A. S. Rawat Samet Oymak 16 25 0 12 Mar 2024
On the Generalization Ability of Unsupervised Pretraining Yuyang Deng Junyuan Hong Jiayu Zhou M. Mahdavi SSL 35 4 0 11 Mar 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Zhenyu (Allen) Zhang Runjin Chen Shiwei Liu Zhewei Yao Olatunji Ruwase Beidi Chen Xiaoxia Wu Zhangyang Wang 26 26 0 05 Mar 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers M. E. Ildiz Yixiao Huang Yingcong Li A. S. Rawat Samet Oymak 16 17 0 21 Feb 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Zhiyuan Li Hong Liu Denny Zhou Tengyu Ma LRM AI4CE 20 95 0 20 Feb 2024
Why are Sensitive Functions Hard for Transformers? Michael Hahn Mark Rofin 20 23 0 15 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention Hugo Cui Freya Behrens Florent Krzakala Lenka Zdeborová MLT 14 11 0 06 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective Gianluigi Lopardo F. Precioso Damien Garreau 6 4 0 05 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features Simone Bombari Marco Mondelli 26 3 0 05 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at Copying Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 95 78 0 01 Feb 2024
An Information-Theoretic Analysis of In-Context Learning Hong Jun Jeon Jason D. Lee Qi Lei Benjamin Van Roy 15 18 0 28 Jan 2024
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars Kaiyue Wen Yuchen Li Bing Liu Andrej Risteski 16 21 0 03 Dec 2023
On the Convergence of Encoder-only Shallow Transformers Yongtao Wu Fanghui Liu Grigorios G. Chrysos V. Cevher 34 5 0 02 Nov 2023
Sequence Length Independent Norm-Based Generalization Bounds for Transformers Jacob Trauger Ambuj Tewari 24 11 0 19 Oct 2023
On the Optimization and Generalization of Multi-head Attention Puneesh Deora Rouzbeh Ghaderi Hossein Taheri Christos Thrampoulidis MLT 34 33 0 19 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression Adam Block Dylan J. Foster Akshay Krishnamurthy Max Simchowitz Cyril Zhang 23 4 0 17 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention Yuandong Tian Yiping Wang Zhenyu (Allen) Zhang Beidi Chen Simon S. Du 16 35 0 01 Oct 2023
Auto-Regressive Next-Token Predictors are Universal Learners Eran Malach LRM 14 36 0 13 Sep 2023
Breaking through the learning plateaus of in-context learning in Transformer Jingwen Fu Tao Yang Yuwang Wang Yan Lu Nanning Zheng 30 0 0 12 Sep 2023
Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis LI DU Yequan Wang Xingrun Xing Yiqun Ya Xiang Li Xin Jiang Xuezhi Fang HILM 15 13 0 11 Sep 2023
Transformers as Support Vector Machines Davoud Ataee Tarzanagh Yingcong Li Christos Thrampoulidis Samet Oymak 25 43 0 31 Aug 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? T. Kajitsuka Issei Sato 29 16 0 26 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens Hengyu Fu Tianyu Guo Yu Bai Song Mei MLT 13 22 0 21 Jul 2023