Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,189 papers shown
Title
Learning from Massive Human Videos for Universal Humanoid Pose Control
Jiageng Mao
Siheng Zhao
Siqi Song
Tianheng Shi
Junjie Ye
Mingtong Zhang
Haoran Geng
Jitendra Malik
Vitor Campagnolo Guizilini
Yue Wang
93
5
0
18 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
121
50
0
18 Dec 2024
Retrieval Augmented Image Harmonization
Haolin Wang
Ming-Yu Liu
Zifei Yan
Chao Zhou
Longan Xiao
Wangmeng Zuo
77
0
0
18 Dec 2024
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model
Yuqiu Liu
Jingxuan Xu
Mauricio Soroco
Yunchao Wei
Wuyang Chen
AI4CE
84
2
0
18 Dec 2024
ConDo: Continual Domain Expansion for Absolute Pose Regression
Zijun Li
Z. Cai
B. Yang
Xuelun Shen
Siqi Shen
Xiaoliang Fan
Michael Paulitsch
Cheng-Yu Wang
CLL
76
0
0
18 Dec 2024
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Massimiliano Viola
Kevin Qu
Nando Metzger
B. Ke
Alexander Becker
Konrad Schindler
Anton Obukhov
VLM
MDE
91
4
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Y. Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
84
0
0
18 Dec 2024
NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment
Andrea Dunn Beltran
Daniel Rho
Marc Niethammer
Roni Sengupta
Roni Sengupta
90
2
0
17 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
W. Liu
X. Wang
3DGS
ViT
113
5
0
17 Dec 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
S. Nagendra
Kashif Rashid
Chaopeng Shen
Daniel Kifer
VLM
71
2
0
16 Dec 2024
DINO-Foresight
\texttt{DINO-Foresight}
DINO-Foresight
: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
82
1
0
16 Dec 2024
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu
Yixin Chen
Junfeng Ni
Baoxiong Jia
Yu Liu
Diwen Wan
Gang Zeng
Siyuan Huang
DiffM
127
4
0
16 Dec 2024
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Salar Abbaspourazad
Anshuman Mishra
Joseph D. Futoma
Andrew C. Miller
Ian Shapiro
88
0
0
15 Dec 2024
Adaptive Visual Perception for Robotic Construction Process: A Multi-Robot Coordination Framework
Jia Xu
Manish Dixit
Xi Wang
74
0
0
15 Dec 2024
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan
Sebastian Stapf
Ahmad Rahimi
Pedro M B Rezende
Yasaman Haghighi
...
Mathieu Salzmann
Davide Scaramuzza
Marc Pollefeys
Paolo Favaro
Alexandre Alahi
VLM
VGen
69
5
0
15 Dec 2024
Medical Manifestation-Aware De-Identification
Yuan Tian
Shuo Wang
Guangtao Zhai
MedIm
73
0
0
14 Dec 2024
Learning Visually Grounded Domain Ontologies via Embodied Conversation and Explanation
Jonghyuk Park
A. Lascarides
S. Ramamoorthy
73
0
0
13 Dec 2024
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models
Faith Johnson
Ryan Meegan
Jack Lowry
Peter Oudemans
Kristin J. Dana
67
0
0
12 Dec 2024
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen
Xingyu Chen
Anpei Chen
Gerard Pons-Moll
Yuliang Xiu
3DGS
86
3
0
12 Dec 2024
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan
Ajay Bati
Sangmin Lee
Daniel Bolya
Judy Hoffman
James M. Rehg
144
2
0
12 Dec 2024
DECOR:Decomposition and Projection of Text Embeddings for Text-to-Image Customization
Geonhui Jang
Jin-Hwa Kim
Yong-Hyun Park
Junho Kim
Gayoung Lee
Yonghyun Jeong
DiffM
79
0
0
12 Dec 2024
Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction
Rafayel Mkrtchyan
Edvard Ghukasyan
Khoren Petrosyan
Hrant Khachatrian
Theofanis P. Raptis
81
0
0
12 Dec 2024
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
Shengxuming Zhang
Weihan Li
Tianhong Gao
Jiacong Hu
Haoming Luo
Mingli Song
Xiuming Zhang
Mingli Song
Zunlei Feng
LM&MA
103
0
0
12 Dec 2024
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
Jiayi Su
Youhe Feng
Zheng Li
Jinhua Song
Yangfan He
Botao Ren
Botian Xu
AI4CE
91
2
0
10 Dec 2024
Is Self-Supervision Enough? Benchmarking Foundation Models Against End-to-End Training for Mitotic Figure Classification
J. Ganz
Jonas Ammeling
Emely Rosbach
Ludwig Lausser
C. Bertram
Katharina Breininger
Marc Aubreville
67
0
0
09 Dec 2024
Detecting Discrepancies Between AI-Generated and Natural Images Using Uncertainty
Jun Nie
Yonggang Zhang
Tongliang Liu
Y. Cheung
Bo Han
Xinmei Tian
UQCV
90
0
0
08 Dec 2024
Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space
Mikolaj Czerkawski
Marcin Kluczek
Jędrzej S. Bojanowski
73
1
0
07 Dec 2024
Slicing Vision Transformer for Flexible Inference
Yitian Zhang
Huseyin Coskun
Xu Ma
Huan Wang
Ke Ma
Xi
Chen
Derek Hao Hu
Y. Fu
ViT
76
0
0
06 Dec 2024
ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage
D. Ivanova
Marco Aversa
Paul Henderson
John Williamson
81
0
0
05 Dec 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
60
4
0
05 Dec 2024
Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts
Chenyang Zhu
Bin Xiao
Lin Shi
Shoukun Xu
Xu Zheng
MoE
91
9
0
05 Dec 2024
Coordinate In and Value Out: Training Flow Transformers in Ambient Space
Yuyang Wang
Anurag Ranjan
J. Susskind
Miguel Angel Bautista
3DPC
73
0
0
05 Dec 2024
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye
Tomas Jakab
Shangzhe Wu
Christian Rupprecht
Andrea Vedaldi
3DPC
3DH
97
1
0
05 Dec 2024
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
Jingyu Lin
Jiaqi Gu
Lubin Fan
Bojian Wu
Yujing Lou
Renjie Chen
Ligang Liu
Jieping Ye
3DGS
114
0
0
05 Dec 2024
Distillation of Diffusion Features for Semantic Correspondence
Frank Fundel
Johannes Schusterbauer
Vincent Tao Hu
Bjorn Ommer
DiffM
89
3
0
04 Dec 2024
DIVE: Taming DINO for Subject-Driven Video Editing
Yi Huang
Wei Xiong
He Zhang
Chaoqi Chen
Jianzhuang Liu
Mingfu Yan
Shifeng Chen
VGen
DiffM
76
0
0
04 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
93
0
0
04 Dec 2024
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
Marcin Przewiȩźlikowski
Randall Balestriero
Wojciech Jasiński
Marek 'Smieja
Bartosz Zieliñski
69
0
0
04 Dec 2024
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla
S. Vallecorsa
A. Schwing
Derek Hoiem
59
0
0
02 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning
Varun Belagali
Srikar Yellapragada
Alexandros Graikos
S. Kapse
Zilinghan Li
Tarak Nandi
Ravi K. Madduri
Prateek Prasanna
Joel H. Saltz
Dimitris Samaras
DiffM
75
1
0
02 Dec 2024
I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames
Katharina Prasse
Isaac Bravo
Stefanie Walter
M. Keuper
67
1
0
02 Dec 2024
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control
Seongmin Park
Hyungmin Kim
Wonseok Jeon
Juyoung Yang
Byeongwook Jeon
Yoonseon Oh
Jungwook Choi
93
1
0
02 Dec 2024
GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024
Xingyu Liu
Yingyue Li
Chengxi Li
Gu Wang
Chenyangguang Zhang
Ziqin Huang
Xiangyang Ji
3DGS
75
2
0
02 Dec 2024
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data
Ivan Deandres-Tame
Ruben Tolosana
Pietro Melzi
R. Vera-Rodríguez
Minchul Kim
...
Bernardo Biesseck
Pedro Vidal
Luiz Coelho
Roger Granada
David Menotti
72
2
0
02 Dec 2024
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
90
11
0
02 Dec 2024
EDTformer: An Efficient Decoder Transformer for Visual Place Recognition
Tong Jin
Feng Lu
Shuyu Hu
Chun Yuan
Yunpeng Liu
ViT
72
0
0
01 Dec 2024
FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
Yunpeng Bai
Qixing Huang
DiffM
91
0
0
01 Dec 2024
Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective
Jiangmeng Li
Zehua Zang
Qirui Ji
Chuxiong Sun
Wenwen Qiang
Junge Zhang
Changwen Zheng
Fuchun Sun
Hui Xiong
SSL
69
0
0
30 Nov 2024
TAROT: Targeted Data Selection via Optimal Transport
Lan Feng
Fan Nie
Yuejiang Liu
Alexandre Alahi
OT
128
1
0
30 Nov 2024
Previous
1
2
3
...
13
14
15
...
42
43
44
Next