ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.09828
  4. Cited By
Mimetic Initialization of Self-Attention Layers

Mimetic Initialization of Self-Attention Layers

16 May 2023
Asher Trockman
J. Zico Kolter
ArXivPDFHTML

Papers citing "Mimetic Initialization of Self-Attention Layers"

25 / 25 papers shown
Title
Freqformer: Frequency-Domain Transformer for 3-D Visualization and Quantification of Human Retinal Circulation
Lingyun Wang
Bingjie Wang
Jay Chhablani
J. Sahel
Shaohua Pi
MedIm
34
1
0
17 Nov 2024
On the Surprising Effectiveness of Attention Transfer for Vision
  Transformers
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Alexander C. Li
Yuandong Tian
B. Chen
Deepak Pathak
Xinlei Chen
38
0
0
14 Nov 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
  Small LMs
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
28
2
0
24 Oct 2024
Mimetic Initialization Helps State Space Models Learn to Recall
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman
Hrayr Harutyunyan
J. Zico Kolter
Sanjiv Kumar
Srinadh Bhojanapalli
Mamba
21
3
0
14 Oct 2024
FINE: Factorizing Knowledge for Initialization of Variable-sized
  Diffusion Models
FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models
Yucheng Xie
Fu Feng
Ruixiao Shi
Jing Wang
Xin Geng
AI4CE
34
2
0
28 Sep 2024
Kolmogorov-Arnold Transformer
Kolmogorov-Arnold Transformer
Xingyi Yang
Xinchao Wang
34
15
0
16 Sep 2024
Reasoning in Large Language Models: A Geometric Perspective
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
44
2
0
02 Jul 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Zhongwang Zhang
Pengxiao Lin
Zhiwei Wang
Yaoyu Zhang
Z. Xu
37
3
0
08 May 2024
Structured Initialization for Attention in Vision Transformers
Structured Initialization for Attention in Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
ViT
21
0
0
01 Apr 2024
SAMformer: Unlocking the Potential of Transformers in Time Series
  Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert
Ambroise Odonnat
Vasilii Feofanov
Aladin Virmaux
Giuseppe Paolo
Themis Palpanas
I. Redko
AI4TS
39
21
0
15 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study
  via Random Features
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
31
3
0
05 Feb 2024
Convolutional Initialization for Data-Efficient Vision Transformers
Convolutional Initialization for Data-Efficient Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
35
2
0
23 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
Characterizing Large Language Model Geometry Helps Solve Toxicity
  Detection and Generation
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
Randall Balestriero
Romain Cosentino
Sarath Shekkizhar
28
2
0
04 Dec 2023
Initializing Models with Larger Ones
Initializing Models with Larger Ones
Zhiqiu Xu
Yanjie Chen
Kirill Vishniakov
Yida Yin
Zhiqiang Shen
Trevor Darrell
Lingjie Liu
Zhuang Liu
28
17
0
30 Nov 2023
Simplifying Transformer Blocks
Simplifying Transformer Blocks
Bobby He
Thomas Hofmann
19
30
0
03 Nov 2023
When can transformers reason with abstract symbols?
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRM
NAI
21
12
0
15 Oct 2023
LEMON: Lossless model expansion
LEMON: Lossless model expansion
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Haibin Lin
Ruoyu Sun
Hongxia Yang
12
12
0
12 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position
  and context
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
21
10
0
07 Oct 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
27
22
0
21 Jul 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
21
173
0
16 Jun 2023
Understanding the Covariance Structure of Convolutional Filters
Understanding the Covariance Structure of Convolutional Filters
Asher Trockman
Devin Willmott
J. Zico Kolter
44
11
0
07 Oct 2022
Training Vision Transformers with Only 2040 Images
Training Vision Transformers with Only 2040 Images
Yunhao Cao
Hao Yu
Jianxin Wu
ViT
95
42
0
26 Jan 2022
Patches Are All You Need?
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
214
402
0
24 Jan 2022
ResNet strikes back: An improved training procedure in timm
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
207
487
0
01 Oct 2021
1