Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,182 papers shown
Title
RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
Chengbo Yuan
Suraj Joshi
Shaoting Zhu
Hang Su
Hang Zhao
Yang Gao
VGen
48
3
0
24 Mar 2025
FG
2
^2
2
: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
Zimin Xia
Alexandre Alahi
58
0
0
24 Mar 2025
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian
Hanting Chen
Mengyu Zheng
Yuchen Liang
Chao Xu
Yunhe Wang
54
0
0
24 Mar 2025
Foundation Model for Whole-Heart Segmentation: Leveraging Student-Teacher Learning in Multi-Modal Medical Imaging
Abdul Qayyum
Moona Mazher
Devran Ugurlu
J. Solís-Lemus
C. Rodero
Steven A Niederer
35
0
0
24 Mar 2025
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai
Qingjie Liu
Y. Wang
MoE
60
0
0
24 Mar 2025
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang
Fadime Sener
Angela Yao
OffRL
54
1
0
24 Mar 2025
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
59
1
0
24 Mar 2025
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang
Guodong Wang
Yizhou Jin
Di Huang
42
1
0
24 Mar 2025
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
Deepayan Das
Davide Talon
Yiming Wang
Massimiliano Mancini
Elisa Ricci
VLM
LRM
42
0
0
24 Mar 2025
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Boqi Chen
Cédric Vincent-Cuaz
Lydia A. Schoenpflug
Manuel Madeira
Lisa Fournier
...
D. Thanou
V. Koelzer
Pascal Frossard
Gabriele Campanella
Gunnar Rätsch
46
0
0
24 Mar 2025
Out-of-distribution evaluations of channel agnostic masked autoencoders in fluorescence microscopy
Christian John Hurry
Jinjie Zhang
Olubukola Ishola
Emma Slade
Cuong Q. Nguyen
OOD
OODD
60
0
0
24 Mar 2025
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X Chen
Misha Sra
Pradeep Sen
50
0
0
24 Mar 2025
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
Zunnan Xu
Zhentao Yu
Zixiang Zhou
Jun Zhou
Xiaoyu Jin
...
Chengfei Cai
Shiyu Tang
Qin Lin
Xiu Li
Qinglin Lu
DiffM
VGen
91
6
0
24 Mar 2025
Histomorphology-driven multi-instance learning for breast cancer WSI classification
Baizhi Wang
Rui Yan
Wenxin Ma
Xu Zhang
Yuhao Wang
X. Li
Yunjie Gu
Zihang Jiang
Shuoling Zhou
46
0
0
23 Mar 2025
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li
Qi Ma
Runyi Yang
Huapeng Li
Mengjiao Ma
...
E. Konukoglu
Theo Gevers
Luc Van Gool
Martin R. Oswald
Danda Pani Paudel
3DGS
VLM
71
0
0
23 Mar 2025
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao
Jinlong Li
Shuang Wang
Mengyao Wu
Qi Zang
N. Sebe
Zhun Zhong
87
0
0
23 Mar 2025
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang
Yanjiang Liu
Ben He
Y. Lu
Hongyu Lin
Jia Zheng
Xianpei Han
Le Sun
Yingfei Sun
39
0
0
23 Mar 2025
InstructVEdit: A Holistic Approach for Instructional Video Editing
Chi Zhang
C. Feng
Feng Yan
Qiming Zhang
Mingjin Zhang
Yujie Zhong
Jing Zhang
Lin Ma
DiffM
VGen
39
0
0
22 Mar 2025
BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors
Yu Wang
Junxian Mu
Hongzhi Huang
Qilong Wang
Pengfei Zhu
Q. Hu
55
0
0
22 Mar 2025
EMPLACE: Self-Supervised Urban Scene Change Detection
Tim Alpherts
Sennay Ghebreab
N. V. Noord
36
0
0
22 Mar 2025
Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon
Hyeontae Son
Dongcheol Hur
Sangwook Kim
3DH
59
1
0
22 Mar 2025
Exploring Few-Shot Object Detection on Blood Smear Images: A Case Study of Leukocytes and Schistocytes
Davide Antonio Mura
Michela Pinna
Lorenzo Putzu
A. Loddo
Alessandra Perniciano
Olga Mulas
Cecilia Di Ruberto
42
0
0
21 Mar 2025
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang
Philippe Weinzaepfel
Vincent Leroy
Lourdes Agapito
Jérôme Revaud
46
0
0
21 Mar 2025
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia
David Bourgin
Krishna Kumar Singh
Yuheng Li
Yan Kang
Zhan Xu
N. Jha
Y. Liu
DiffM
VGen
72
0
0
21 Mar 2025
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Vishwesh Ramanathan
Tony Xu
Pushpak Pati
Faruk Ahmed
Maged Goubran
Anne L. Martel
43
0
0
21 Mar 2025
Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting
Simona Kocour
Assia Benbihi
Aikaterini Adam
Torsten Sattler
3DPC
41
0
0
21 Mar 2025
MagicColor: Multi-Instance Sketch Colorization
Y. Zhang
Yue Ma
Bingyuan Wang
Qifeng Chen
Zeyu Wang
DiffM
65
0
0
21 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
54
0
0
21 Mar 2025
GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations
Z. Liu
Fan Zhang
Junfeng Jiao
Ni Lao
Gengchen Mai
47
1
0
20 Mar 2025
Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras
Beilei Cui
Long Bai
Mobarakol Islam
An-Chi Wang
Z. Ma
...
Feng Li
Zhen Chen
Zhongliang Jiang
Nassir Navab
Hongliang Ren
MedIm
60
0
0
20 Mar 2025
Single Image Iterative Subject-driven Generation and Editing
Yair Shpitzer
Gal Chechik
Idan Schwartz
48
0
0
20 Mar 2025
A Vision Centric Remote Sensing Benchmark
Abduljaleel Adejumo
Faegheh Yeganli
Clifford Broni-Bediako
Aoran Xiao
Naoto Yokoya
Mennatullah Siam
60
0
0
20 Mar 2025
Disentangled and Interpretable Multimodal Attention Fusion for Cancer Survival Prediction
Aniek Eijpe
Soufyan Lakbir
Melis Erdal Cesur
Sara P. Oliveira
Sanne Abeln
Wilson Silva
36
0
0
20 Mar 2025
TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data
Rohit Kundu
Athula Balachandran
A. Roy-Chowdhury
40
0
0
20 Mar 2025
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim
Gwangtak Bae
E. Lee
Young Min Kim
3DPC
3DV
60
0
0
20 Mar 2025
MapGlue: Multimodal Remote Sensing Image Matching
Peihao Wu
Yongxiang Yao
Wenfei Zhang
Dong Wei
Y. Wan
Yansheng Li
Yongjun Zhang
44
0
0
20 Mar 2025
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli
Christos Sakaridis
Mattia Segu
Y. Yang
Siyuan Li
Wim Abbeloos
Luc Van Gool
MDE
40
0
0
20 Mar 2025
M3: 3D-Spatial MultiModal Memory
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
54
0
0
20 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
100
0
0
20 Mar 2025
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Marc Benedí San Millán
Angela Dai
Matthias Nießner
DiffM
67
0
0
20 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
95
3
0
19 Mar 2025
Efficient Personalization of Quantized Diffusion Model without Backpropagation
H. Seo
Wongi Jeong
Kyungryeol Lee
Se Young Chun
DiffM
MQ
76
0
0
19 Mar 2025
Object-Centric Pretraining via Target Encoder Bootstrapping
Nikola Đukić
Tim Lebailly
Tinne Tuytelaars
OCL
66
0
0
19 Mar 2025
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
58
0
0
19 Mar 2025
When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach
Vaibhav Rathore
S. Bagchi
Saikat Dutta
Sarthak Mehrotra
Zsolt Kira
Biplab Banerjee
OOD
74
1
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
P. Radeva
49
0
0
19 Mar 2025
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam
Soowon Son
Zhan Xu
Jing Shi
Difan Liu
Feng Liu
Aashish Misraa
Seungryong Kim
Yang Zhou
DiffM
39
0
0
19 Mar 2025
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
Masud Ahmed
Zahid Hasan
Syed Arefinul Haque
A. Faridee
S. Purushotham
Suya You
Nirmalya Roy
48
0
0
19 Mar 2025
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models
Teng-Fang Hsiao
Bo-Kai Ruan
Yi-Lun Wu
Tzu-Ling Lin
Hong-Han Shuai
VLM
48
0
0
19 Mar 2025
Distilling 3D distinctive local descriptors for 6D pose estimation
Amir Hamza
Andrea Caraffa
Davide Boscaini
Fabio Poiesi
44
0
0
19 Mar 2025
Previous
1
2
3
...
6
7
8
...
42
43
44
Next