ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.14023
  4. Cited By
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

26 July 2023
T. Kajitsuka
Issei Sato
ArXivPDFHTML

Papers citing "Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?"

14 / 14 papers shown
Title
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
29
0
0
18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
24
0
0
16 Apr 2025
Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)
Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)
Lena Strobl
Dana Angluin
Robert Frank
33
0
0
28 Mar 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
36
9
0
03 Jan 2025
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
28
3
0
02 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
33
2
0
02 Oct 2024
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Naoya Hasegawa
Issei Sato
26
0
0
26 Sep 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
34
3
0
03 Sep 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
25
6
0
08 Aug 2024
Dynamical Mean-Field Theory of Self-Attention Neural Networks
Dynamical Mean-Field Theory of Self-Attention Neural Networks
Ángel Poc-López
Miguel Aguilera
AI4CE
22
0
0
11 Jun 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence
  Learning Tasks
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen
Difan Zou
ViT
22
12
0
02 Apr 2024
The Closeness of In-Context Learning and Weight Shifting for Softmax
  Regression
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao-quan Song
Yu Xia
Tong Yu
Tianyi Zhou
10
32
0
26 Apr 2023
Your Transformer May Not be as Powerful as You Expect
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
52
50
0
26 May 2022
Universal Approximation Under Constraints is Possible with Transformers
Universal Approximation Under Constraints is Possible with Transformers
Anastasis Kratsios
Behnoosh Zamanlooy
Tianlin Liu
Ivan Dokmanić
40
26
0
07 Oct 2021
1