ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.09828
  4. Cited By
Mimetic Initialization of Self-Attention Layers

Mimetic Initialization of Self-Attention Layers

International Conference on Machine Learning (ICML), 2023
16 May 2023
Asher Trockman
J. Zico Kolter
ArXiv (abs)PDFHTML

Papers citing "Mimetic Initialization of Self-Attention Layers"

11 / 11 papers shown
Title
Cutting the Skip: Training Residual-Free Transformers
Cutting the Skip: Training Residual-Free Transformers
Yiping Ji
James Martens
Jianqiao Zheng
Ziqin Zhou
Peyman Moghadam
Xinyu Zhang
Hemanth Saratchandran
Simon Lucey
155
1
0
30 Sep 2025
Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification
Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification
Ayaka Tsutsumi
Guang Li
Ren Togo
Takahiro Ogawa
Satoshi Kondo
Miki Haseyama
92
0
0
28 Aug 2025
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Yongkang Li
Kaixin Xiong
Xiangyu Guo
Fang Li
Sixu Yan
...
Guang Chen
Hangjun Ye
Wenyu Liu
Xinggang Wang
Xinggang Wang
VLM
235
37
0
09 Jun 2025
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
Matteo Saponati
Pascal Sager
Pau Vilimelis Aceituno
Thilo Stadelmann
Benjamin Grewe
163
4
0
15 Feb 2025
Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal Vasculature
Lingyun Wang
Bingjie Wang
Jay Chhablani
J. Sahel
Shaohua Pi
MedIm
176
2
0
17 Nov 2024
Reasoning in Large Language Models: A Geometric Perspective
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
208
3
0
02 Jul 2024
When can transformers reason with abstract symbols?
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRMNAI
272
20
0
15 Oct 2023
LEMON: Lossless model expansion
LEMON: Lossless model expansionInternational Conference on Learning Representations (ICLR), 2023
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Yanghua Peng
Tian Ding
Hongxia Yang
206
20
0
12 Oct 2023
Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
Zixun Huang
Keling Yao
Seth Z. Zhao
Chuanyu Pan
Chenfeng Xu
337
3
0
24 Sep 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
409
276
0
16 Jun 2023
On the Relationship between Self-Attention and Convolutional Layers
On the Relationship between Self-Attention and Convolutional LayersInternational Conference on Learning Representations (ICLR), 2019
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
533
601
0
08 Nov 2019
1