Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.08013
Cited By
FlexiViT: One Model for All Patch Sizes
15 December 2022
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FlexiViT: One Model for All Patch Sizes"
16 / 16 papers shown
Title
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang
Zhitong Xiong
Chenying Liu
Adam J. Stewart
Thomas Dujardin
...
Angelos Zavras
Franziska Gerken
Ioannis Papoutsis
Laura Leal-Taixé
Xiao Xiang Zhu
38
1
0
14 Mar 2025
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
45
3
0
05 Nov 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
74
54
0
19 Sep 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
22
1
0
27 Dec 2023
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Zeyu Yun
Juexiao Zhang
Bruno A. Olshausen
Yann LeCun
18
0
0
06 Oct 2023
Transformer-based end-to-end classification of variable-length volumetric data
Marzieh Oghbaie
Teresa Araújo
T. Emre
U. Schmidt-Erfurth
Hrvoje Bogunović
ViT
MedIm
11
3
0
13 Jul 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
20
2,983
0
14 Apr 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
29
562
0
10 Feb 2023
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
103
67
0
20 May 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
179
69
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
246
1,817
0
18 Aug 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1