Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05442
Cited By
Scaling Vision Transformers to 22 Billion Parameters
10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim M. Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Xiao Wang
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
F. I. F. Richard Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Vision Transformers to 22 Billion Parameters"
50 / 416 papers shown
Title
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via
D
\mathbf{\texttt{D}}
D
ual-
H
\mathbf{\texttt{H}}
H
ead
O
\mathbf{\texttt{O}}
O
ptimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
33
0
0
12 May 2025
Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators
Will Hawkins
Chris Russell
Brent Mittelstadt
DiffM
28
0
0
06 May 2025
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen
Yücel Yemez
OCL
33
0
0
04 May 2025
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Shunxian Gu
Chaoqun You
Bangbang Ren
Lailong Luo
Junxu Xia
Deke Guo
29
0
0
02 May 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki
Qi Dai
Lee Hyoseok
Chong Luo
Tae-Hyun Oh
59
0
0
01 May 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
74
0
0
30 Apr 2025
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
31
0
0
24 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Shumin Wang
Zhuoran Yang
L. Wang
Zhipeng Tang
Heng Li
Lehan Pan
Sha Zhang
Jie Peng
J. Ji
Y. Zhang
3DPC
38
0
0
17 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
26
1
0
14 Apr 2025
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
Adrian Cosma
Andy Catruna
Emilian Radoi
31
0
0
10 Apr 2025
Adaptive Computation Pruning for the Forgetting Transformer
Zhixuan Lin
J. Obando-Ceron
Xu Owen He
Aaron C. Courville
30
0
0
09 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
50
0
0
02 Apr 2025
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
Xianglong He
Junyi Chen
Di Huang
Zexiang Liu
Xiaoshui Huang
Wanli Ouyang
C. Yuan
Yangguang Li
DiffM
49
0
0
29 Mar 2025
ImF: Implicit Fingerprint for Large Language Models
Wu jiaxuan
Peng Wanli
Fu hang
Xue Yiming
Wen juan
29
0
0
25 Mar 2025
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
82
24
0
25 Mar 2025
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Suhas G Hegde
S. K
Aruna Tiwari
49
0
0
25 Mar 2025
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai
Andrea Vedaldi
34
0
0
25 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
49
0
0
25 Mar 2025
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
59
1
0
24 Mar 2025
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liming Jiang
Qing Yan
Yumin Jia
Zichuan Liu
Hao Kang
Xin Lu
41
1
0
20 Mar 2025
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Kevin Wang
Ishaan Javali
Michał Bortkiewicz
Tomasz Trzciñski
Benjamin Eysenbach
SSL
OffRL
62
0
0
19 Mar 2025
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Harshal Kausadikar
Tanvi Kale
Onkar Susladkar
Sparsh Mittal
45
0
0
17 Mar 2025
APLA: A Simple Adaptation Method for Vision Transformers
Moein Sorkhei
Emir Konuk
Kevin Smith
Christos Matsoukas
43
0
0
14 Mar 2025
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen
Rui Qian
Wenze Hu
Tsu-jui Fu
Jialing Tong
...
Lezhi Li
Bowen Zhang
A. Schwing
Wei Liu
Y. Yang
45
0
0
13 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
46
0
0
06 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
64
0
0
05 Mar 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
33
0
0
03 Mar 2025
Proteina: Scaling Flow-based Protein Structure Generative Models
Tomas Geffner
Kieran Didi
Zuobai Zhang
Danny Reidenbach
Zhonglin Cao
...
Mario Geiger
Christian Dallago
E. Küçükbenli
Arash Vahdat
Karsten Kreis
DiffM
AI4CE
38
4
0
02 Mar 2025
BAnG: Bidirectional Anchored Generation for Conditional RNA Design
Roman Klypa
Alberto Bietti
Sergei Grudinin
35
0
0
28 Feb 2025
Unsupervised Parameter Efficient Source-free Post-pretraining
Abhishek Jha
Tinne Tuytelaars
Yuki M. Asano
OOD
38
0
0
28 Feb 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
76
0
0
27 Feb 2025
Bayesian Computation in Deep Learning
Wenlong Chen
Bolian Li
Ruqi Zhang
Yingzhen Li
BDL
65
0
0
25 Feb 2025
Optimizing Estimators of Squared Calibration Errors in Classification
Sebastian G. Gruber
Francis Bach
61
1
0
24 Feb 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
47
1
0
24 Feb 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
63
1
0
24 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
67
0
0
24 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
53
36
0
24 Feb 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
46
2
0
21 Feb 2025
One Model for All: Large Language Models are Domain-Agnostic Recommendation Systems
Zuoli Tang
Zhaoxin Huan
Zihao Li
Xiaolu Zhang
Jun Hu
Chilin Fu
Jun Zhou
Lixin Zou
Chenliang Li
59
15
0
20 Feb 2025
Object-Centric Latent Action Learning
Albina Klepach
Alexander Nikulin
Ilya Zisman
Denis Tarasov
Alexander Derevyagin
Andrei Polubarov
Nikita Lyubaykin
Vladislav Kurenkov
41
0
0
13 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Z. Yang
Mike Zheng Shou
MoE
63
0
0
10 Feb 2025
PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Zeman Li
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
MoMe
75
1
0
10 Feb 2025
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Feng Wang
Yaodong Yu
Guoyizhe Wei
Wei Shao
Yuyin Zhou
Alan Yuille
Cihang Xie
ViT
85
4
0
06 Feb 2025
FuXi-
α
\alpha
α
: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
Yufei Ye
Wei Guo
Jin Yao Chin
Hao Wang
Hong Zhu
...
Yuyang Ye
Y. Liu
Ruiming Tang
Defu Lian
Enhong Chen
86
2
0
05 Feb 2025
PDC-ViT : Source Camera Identification using Pixel Difference Convolution and Vision Transformer
O. Elharrouss
Y. Akbari
Noor Almaadeed
S. Al-Maadeed
F. Khelifi
Ahmed Bouridane
34
1
0
28 Jan 2025
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Y. Wang
Wang Chen
Kang Yang
Deying Li
Jianfei Cai
3DPC
63
3
0
17 Jan 2025
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Jialin Wu
Kaikai Pan
Yanjiao Chen
Jiangyi Deng
Shengyuan Pang
Wenyuan Xu
ViT
AAML
41
0
0
13 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
59
17
0
31 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
71
2
0
20 Dec 2024
1
2
3
4
5
6
7
8
9
Next