Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.09828
Cited By
Mimetic Initialization of Self-Attention Layers
International Conference on Machine Learning (ICML), 2023
16 May 2023
Asher Trockman
J. Zico Kolter
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mimetic Initialization of Self-Attention Layers"
31 / 31 papers shown
Cutting the Skip: Training Residual-Free Transformers
Yiping Ji
James Martens
Jianqiao Zheng
Ziqin Zhou
Peyman Moghadam
Xinyu Zhang
Hemanth Saratchandran
Simon Lucey
159
1
0
30 Sep 2025
Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification
Ayaka Tsutsumi
Guang Li
Ren Togo
Takahiro Ogawa
Satoshi Kondo
Miki Haseyama
108
0
0
28 Aug 2025
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Yongkang Li
Kaixin Xiong
Xiangyu Guo
Fang Li
Sixu Yan
...
Guang Chen
Hangjun Ye
Wenyu Liu
Xinggang Wang
Xinggang Wang
VLM
264
5
0
09 Jun 2025
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
Liangkai Hang
Junjie Yao
Zhiwei Bai
Jiahao Huo
Yang Chen
...
Feiyu Xiong
Y. Zhang
Weinan E
Hongkang Yang
Zhi-hai Xu
LRM
205
2
0
29 May 2025
Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning
Zachary Shinnick
Liangze Jiang
Hemanth Saratchandran
Anton Van Den Hengel
Damien Teney
158
1
0
28 May 2025
Structured Initialization for Vision Transformers
Jianqiao Zheng
Xueqian Li
Hemanth Saratchandran
Simon Lucey
ViT
206
0
0
26 May 2025
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
Matteo Saponati
Pascal Sager
Pau Vilimelis Aceituno
Thilo Stadelmann
Benjamin Grewe
203
4
0
15 Feb 2025
Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal Vasculature
Lingyun Wang
Bingjie Wang
Jay Chhablani
J. Sahel
Shaohua Pi
MedIm
211
2
0
17 Nov 2024
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Neural Information Processing Systems (NeurIPS), 2024
Alexander C. Li
Yuandong Tian
Bin Chen
Deepak Pathak
Xinlei Chen
208
9
0
14 Nov 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
465
10
0
24 Oct 2024
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman
Hrayr Harutyunyan
J. Zico Kolter
Sanjiv Kumar
Srinadh Bhojanapalli
Mamba
128
8
0
14 Oct 2024
FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models
Yucheng Xie
Fu Feng
Ruixiao Shi
Jing Wang
Xin Geng
AI4CE
200
5
0
28 Sep 2024
Kolmogorov-Arnold Transformer
Xingyi Yang
Xinchao Wang
251
79
0
16 Sep 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
215
3
0
02 Jul 2024
Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers
Hongyi Pan
Emadeldeen Hamdan
Xin Zhu
Ahmet Enis Cetin
Ulas Bagci
ViT
180
0
0
22 May 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Neural Information Processing Systems (NeurIPS), 2024
Zhongwang Zhang
Pengxiao Lin
Zhiwei Wang
Yaoyu Zhang
Z. Xu
600
3
0
08 May 2024
Structured Initialization for Attention in Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
ViT
262
2
0
01 Apr 2024
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert
Ambroise Odonnat
Vasilii Feofanov
Aladin Virmaux
Giuseppe Paolo
Themis Palpanas
I. Redko
AI4TS
296
51
0
15 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
International Conference on Machine Learning (ICML), 2024
Simone Bombari
Marco Mondelli
294
6
0
05 Feb 2024
Convolutional Initialization for Data-Efficient Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
261
3
0
23 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
402
12
0
09 Jan 2024
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
International Conference on Machine Learning (ICML), 2023
Randall Balestriero
Romain Cosentino
Sarath Shekkizhar
328
6
0
04 Dec 2023
Initializing Models with Larger Ones
International Conference on Learning Representations (ICLR), 2023
Zhiqiu Xu
Yanjie Chen
Kirill Vishniakov
Yida Yin
Zhiqiang Shen
Trevor Darrell
Lingjie Liu
Zhuang Liu
239
28
0
30 Nov 2023
Simplifying Transformer Blocks
International Conference on Learning Representations (ICLR), 2023
Bobby He
Thomas Hofmann
397
47
0
03 Nov 2023
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRM
NAI
285
20
0
15 Oct 2023
LEMON: Lossless model expansion
International Conference on Learning Representations (ICLR), 2023
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Yanghua Peng
Tian Ding
Hongxia Yang
214
20
0
12 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
237
14
0
07 Oct 2023
Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
Zixun Huang
Keling Yao
Seth Z. Zhao
Chuanyu Pan
Chenfeng Xu
353
3
0
24 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Neural Information Processing Systems (NeurIPS), 2023
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
201
35
0
21 Jul 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
410
277
0
16 Jun 2023
On the Relationship between Self-Attention and Convolutional Layers
International Conference on Learning Representations (ICLR), 2019
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
563
607
0
08 Nov 2019
1