Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,169 papers shown
Title
Understanding Co-speech Gestures in-the-wild
Sindhu B. Hegde
KR Prajwal
Taein Kwon
Andrew Zisserman
SLR
52
0
0
28 Mar 2025
AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization
Martin Kiss
Michal Hradiš
Martina Dvořáková
Václav Jiroušek
Filip Kersch
43
1
0
28 Mar 2025
High-Fidelity Diffusion Face Swapping with ID-Constrained Facial Conditioning
Dailan He
X. Wang
Shulun Wang
Guanglu Song
Bingqi Ma
Hao Shao
Y. Liu
Hongsheng Li
DiffM
60
0
0
28 Mar 2025
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
Erik Wallin
Fredrik Kahl
Lars Hammarstrand
OODD
52
0
0
27 Mar 2025
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen
Rolandos Alexandros Potamias
Shizhe Chen
Cordelia Schmid
3DH
48
0
0
27 Mar 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
67
0
0
27 Mar 2025
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Haolin Liu
Xiaohang Zhan
Zizheng Yan
Zhongjin Luo
Yuxin Wen
Xiaoguang Han
54
0
0
27 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
82
2
0
27 Mar 2025
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani
Gianluca Villani
Theodoros Panagiotakopoulos
Marc Botet Colomer
Linus Harenstam-Nielsen
...
Pier Luigi Dovesi
Jussi Karlgren
Daniel Cremers
F. Tombari
Matteo Poggi
VLM
42
0
0
27 Mar 2025
LOCORE: Image Re-ranking with Long-Context Sequence Modeling
Zilin Xiao
Pavel Suma
Ayush Sachdeva
Hao-Jen Wang
Giorgos Kordopatis-Zilos
Giorgos Tolias
Vicente Ordonez
57
0
0
27 Mar 2025
Dual-Task Learning for Dead Tree Detection and Segmentation with Hybrid Self-Attention U-Nets in Aerial Imagery
Anis Ur Rahman
Einari Heinaro
Mete Ahishali
Samuli Junttila
32
1
0
27 Mar 2025
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Yiqing Shen
Bohan Liu
Chenjia Li
Lalithkumar Seenivasan
Mathias Unberath
VOS
75
2
0
27 Mar 2025
UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation
Yehui Shen
Lei Zhang
Qingqiu Li
Xiongwei Zhao
Y. Wang
Huimin Lu
Xieyuanli Chen
41
0
0
27 Mar 2025
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He
Zi-Xin Zou
Chia-Hao Chen
Y. Guo
Ding Liang
Chun Yuan
Wanli Ouyang
Yan-Pei Cao
Yangguang Li
49
0
0
27 Mar 2025
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen
Lingting Zhu
Zeyu Hu
Shengju Qian
Y. Chen
Xin Wang
G. Lee
97
1
0
26 Mar 2025
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI
Alejandro Lozano
M. W. Sun
James Burgess
Jeffrey Nirschl
Christopher Polzak
...
Xiaohan Wang
Alfred Seunghoon Song
Chiang Chia-Chun
Robert Tibshirani
Serena Yeung-Levy
LM&MA
73
1
0
26 Mar 2025
DINeMo: Learning Neural Mesh Models with no 3D Annotations
Weijie Guo
Guofeng Zhang
Wufei Ma
A. Yuille
3DH
96
0
0
26 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
84
0
0
26 Mar 2025
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
Rongyu Zhang
Menghang Dong
Yuan Zhang
Liang Heng
Xiaowei Chi
Gaole Dai
Li Du
Dan Wang
Yuan Du
MoE
81
0
0
26 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou
J. Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
65
1
0
25 Mar 2025
Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings
Chengan Che
Chao Wang
Tom Vercauteren
Sophia Tsoka
Luis C. García-Peraza-Herrera
MedIm
41
0
0
25 Mar 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
72
0
0
25 Mar 2025
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić
Yannis Kalantidis
Jirí Matas
Giorgos Tolias
VLM
46
0
0
25 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
68
0
0
25 Mar 2025
Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models
Bowei Tian
Xuntao Lyu
Meng Liu
Hongyi Wang
Ang Li
44
0
0
25 Mar 2025
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou
Tianyi Zhang
Yuwen Xiong
Haonan Duan
Hengjun Pu
...
Chengyang Zhao
X. Zhu
Yu Qiao
Jifeng Dai
Y. Chen
59
1
0
25 Mar 2025
The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs
Jonathan Sauder
Viktor Domazetoski
G. Banc-Prandi
Gabriela Perna
Anders Meibom
D. Tuia
45
0
0
25 Mar 2025
Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors
Yuke Lou
Yiming Wang
Zhen Wu
Rui Zhao
Wenjia Wang
Mingyi Shi
Taku Komura
37
0
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
42
0
0
25 Mar 2025
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
51
0
0
25 Mar 2025
SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors
Yiqing Li
X. Wang
Jiawei Wu
Yikun Ma
Zhi Jin
3DGS
39
0
0
25 Mar 2025
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Beak
Hyeonwoo Kim
Hanbyul Joo
41
0
0
25 Mar 2025
AvatarArtist: Open-Domain 4D Avatarization
Hongyu Liu
Xuan Wang
Ziyu Wan
Yue Ma
Jingye Chen
Yanbo Fan
Yujun Shen
Yibing Song
Qifeng Chen
41
0
0
25 Mar 2025
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Boqi Chen
Cédric Vincent-Cuaz
Lydia A. Schoenpflug
Manuel Madeira
Lisa Fournier
...
D. Thanou
V. Koelzer
Pascal Frossard
Gabriele Campanella
Gunnar Rätsch
46
0
0
24 Mar 2025
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian
Hanting Chen
Mengyu Zheng
Yuchen Liang
Chao Xu
Yunhe Wang
54
0
0
24 Mar 2025
PALATE: Peculiar Application of the Law of Total Expectation to Enhance the Evaluation of Deep Generative Models
Tadeusz Dziarmaga
Marcin Kądziołka
Artur Kasymov
Marcin Mazur
EGVM
89
0
0
24 Mar 2025
Surface-Aware Distilled 3D Semantic Features
Lukas Uzolas
E. Eisemann
Petr Kellnhofer
3DPC
3DH
78
0
0
24 Mar 2025
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang
Guodong Wang
Yizhou Jin
Di Huang
42
1
0
24 Mar 2025
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
Zunnan Xu
Zhentao Yu
Zixiang Zhou
Jun Zhou
Xiaoyu Jin
...
Chengfei Cai
Shiyu Tang
Qin Lin
Xiu Li
Qinglin Lu
DiffM
VGen
91
6
0
24 Mar 2025
FG
2
^2
2
: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
Zimin Xia
Alexandre Alahi
58
0
0
24 Mar 2025
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X Chen
Misha Sra
Pradeep Sen
50
0
0
24 Mar 2025
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai
Qingjie Liu
Y. Wang
MoE
60
0
0
24 Mar 2025
Foundation Model for Whole-Heart Segmentation: Leveraging Student-Teacher Learning in Multi-Modal Medical Imaging
Abdul Qayyum
Moona Mazher
Devran Ugurlu
J. Solís-Lemus
C. Rodero
Steven A Niederer
35
0
0
24 Mar 2025
RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
Chengbo Yuan
Suraj Joshi
Shaoting Zhu
Hang Su
Hang Zhao
Yang Gao
VGen
48
3
0
24 Mar 2025
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Qin Wang
Benjamin Bruns
Hanno Scharr
Kai Krajsek
48
0
0
24 Mar 2025
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
59
1
0
24 Mar 2025
Out-of-distribution evaluations of channel agnostic masked autoencoders in fluorescence microscopy
Christian John Hurry
Jinjie Zhang
Olubukola Ishola
Emma Slade
Cuong Q. Nguyen
OOD
OODD
60
0
0
24 Mar 2025
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang
Fadime Sener
Angela Yao
OffRL
54
1
0
24 Mar 2025
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao
Wei Chen
Qiang Qiu
90
1
0
24 Mar 2025
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
Deepayan Das
Davide Talon
Yiming Wang
Massimiliano Mancini
Elisa Ricci
VLM
LRM
39
0
0
24 Mar 2025
Previous
1
2
3
...
5
6
7
...
42
43
44
Next