ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.10090
  4. Cited By
Inductive Biases and Variable Creation in Self-Attention Mechanisms

Inductive Biases and Variable Creation in Self-Attention Mechanisms

19 October 2021
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
ArXivPDFHTML

Papers citing "Inductive Biases and Variable Creation in Self-Attention Mechanisms"

50 / 94 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
20
0
0
13 May 2025
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Zhaiming Shen
Alex Havrilla
Rongjie Lai
A. Cloninger
Wenjing Liao
37
0
0
06 May 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
34
0
0
18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
29
0
0
16 Apr 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
43
0
0
14 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
96
2
0
04 Mar 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRL
LRM
AI4CE
50
9
0
24 Feb 2025
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Arya Honarpisheh
Mustafa Bozdag
M. Sznaier
Octavia Camps
Mamba
67
0
0
03 Feb 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
44
9
0
03 Jan 2025
Understanding Scaling Laws with Statistical and Approximation Theory for
  Transformer Neural Networks on Intrinsically Low-dimensional Data
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
Alex Havrilla
Wenjing Liao
26
8
0
11 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
36
2
0
11 Nov 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and
  Convolution Models Using a Novel Metric
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric
Baiyuan Chen
MLT
18
0
0
23 Oct 2024
On Rank-Dependent Generalisation Error Bounds for Transformers
On Rank-Dependent Generalisation Error Bounds for Transformers
Lan V. Truong
32
2
0
15 Oct 2024
Learning Linear Attention in Polynomial Time
Learning Linear Attention in Polynomial Time
Morris Yau
Ekin Akyürek
Jiayuan Mao
Joshua B. Tenenbaum
Stefanie Jegelka
Jacob Andreas
17
2
0
14 Oct 2024
Generalizable autoregressive modeling of time series through functional
  narratives
Generalizable autoregressive modeling of time series through functional narratives
Ran Liu
Wenrui Ma
Ellen L. Zippi
Hadi Pouransari
Jingyun Xiao
...
Behrooz Mahasseni
Juri Minxha
Erdrin Azemi
Eva L. Dyer
Ali Moin
AI4TS
25
0
0
10 Oct 2024
Large Language Models as Markov Chains
Large Language Models as Markov Chains
Oussama Zekri
Ambroise Odonnat
Abdelhakim Benechehab
Linus Bleistein
Nicolas Boullé
I. Redko
34
9
0
03 Oct 2024
On the Inductive Bias of Stacking Towards Improving Reasoning
On the Inductive Bias of Stacking Towards Improving Reasoning
Nikunj Saunshi
Stefani Karp
Shankar Krishnan
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
LRM
AI4CE
29
4
0
27 Sep 2024
Non-asymptotic Convergence of Training Transformers for Next-token
  Prediction
Non-asymptotic Convergence of Training Transformers for Next-token Prediction
Ruiquan Huang
Yingbin Liang
Jing Yang
21
5
0
25 Sep 2024
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale
  Reinforcement Learning Fine-Tuning
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
Jiaheng Hu
Rose Hendrix
Ali Farhadi
Aniruddha Kembhavi
Roberto Martin-Martin
Peter Stone
Kuo-Hao Zeng
Kiana Ehsani
31
7
0
25 Sep 2024
In-Context Learning with Representations: Contextual Generalization of
  Trained Transformers
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
Tong Yang
Yu Huang
Yingbin Liang
Yuejie Chi
MLT
27
5
0
19 Aug 2024
Promises and Pitfalls of Generative Masked Language Modeling:
  Theoretical Framework and Practical Guidelines
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
Yuchen Li
Alexandre Kirchmeyer
Aashay Mehta
Yilong Qin
Boris Dadachev
Kishore Papineni
Sanjiv Kumar
Andrej Risteski
38
0
0
22 Jul 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
59
1
0
15 Jul 2024
Interpretable Lightweight Transformer via Unrolling of Learned Graph
  Smoothness Priors
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors
Tam Thuc Do
Parham Eftekhar
Seyed Alireza Hosseini
Gene Cheung
Philip A. Chou
21
0
0
06 Jun 2024
Length independent generalization bounds for deep SSM architectures
Length independent generalization bounds for deep SSM architectures
Dániel Rácz
M. Petreczky
Bálint Daróczy
34
1
0
30 May 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics
  Theory of Transformers
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
6
0
24 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training
  Dynamics
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
47
13
0
07 May 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
31
25
0
04 Apr 2024
Mechanics of Next Token Prediction with Self-Attention
Mechanics of Next Token Prediction with Self-Attention
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
16
25
0
12 Mar 2024
On the Generalization Ability of Unsupervised Pretraining
On the Generalization Ability of Unsupervised Pretraining
Yuyang Deng
Junyuan Hong
Jiayu Zhou
M. Mahdavi
SSL
35
4
0
11 Mar 2024
Found in the Middle: How Language Models Use Long Contexts Better via
  Plug-and-Play Positional Encoding
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
Zhenyu (Allen) Zhang
Runjin Chen
Shiwei Liu
Zhewei Yao
Olatunji Ruwase
Beidi Chen
Xiaoxia Wu
Zhangyang Wang
26
26
0
05 Mar 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of
  Generative Transformers
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
M. E. Ildiz
Yixiao Huang
Yingcong Li
A. S. Rawat
Samet Oymak
16
17
0
21 Feb 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial
  Problems
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
20
95
0
20 Feb 2024
Why are Sensitive Functions Hard for Transformers?
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
20
23
0
15 Feb 2024
A phase transition between positional and semantic learning in a
  solvable model of dot-product attention
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
14
11
0
06 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Gianluigi Lopardo
F. Precioso
Damien Garreau
6
4
0
05 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study
  via Random Features
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
26
3
0
05 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
95
78
0
01 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
15
18
0
28 Jan 2024
Transformers are uninterpretable with myopic methods: a case study with
  bounded Dyck grammars
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
16
21
0
03 Dec 2023
On the Convergence of Encoder-only Shallow Transformers
On the Convergence of Encoder-only Shallow Transformers
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
34
5
0
02 Nov 2023
Sequence Length Independent Norm-Based Generalization Bounds for
  Transformers
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
Jacob Trauger
Ambuj Tewari
24
11
0
19 Oct 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
34
33
0
19 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning
  and Autoregression
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
23
4
0
17 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
16
35
0
01 Oct 2023
Auto-Regressive Next-Token Predictors are Universal Learners
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
LRM
14
36
0
13 Sep 2023
Breaking through the learning plateaus of in-context learning in
  Transformer
Breaking through the learning plateaus of in-context learning in Transformer
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
30
0
0
12 Sep 2023
Quantifying and Attributing the Hallucination of Large Language Models
  via Association Analysis
Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis
LI DU
Yequan Wang
Xingrun Xing
Yiqun Ya
Xiang Li
Xin Jiang
Xuezhi Fang
HILM
15
13
0
11 Sep 2023
Transformers as Support Vector Machines
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
25
43
0
31 Aug 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
13
22
0
21 Jul 2023
12
Next