Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.06093
Cited By
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
11 August 2023
Yongqian Huang
Peng Ye
Xiaoshui Huang
Sheng R. Li
Tao Chen
Tong He
Wanli Ouyang
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Experts Weights Averaging: A New General Training Scheme for Vision Transformers"
8 / 8 papers shown
Title
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
129
1
0
07 Feb 2025
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
99
42
0
11 Oct 2022
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
92
109
0
07 Jun 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
191
128
0
19 May 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
149
327
0
18 Feb 2022
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
303
5,773
0
29 Apr 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
168
686
0
22 Apr 2021
RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding
X. Zhang
Ningning Ma
Jungong Han
Guiguang Ding
Jian-jun Sun
120
1,544
0
11 Jan 2021
1