ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLM
    CLIP
    SSL
ArXivPDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 2,169 papers shown
Title
CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation
CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation
Yifei Tong
RunZe Tian
Xiao Han
Dingyao Liu
Fenggen Yu
Yan Zhang
3DGS
31
0
0
17 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
X. Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
29
0
0
17 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
27
0
0
17 Apr 2025
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery
Wei Zhang
Miaoxin Cai
Yaqian Ning
T. Zhang
Yin Zhuang
He Chen
Jun Li
Xuerui Mao
36
0
0
17 Apr 2025
SOPHY: Generating Simulation-Ready Objects with Physical Materials
SOPHY: Generating Simulation-Ready Objects with Physical Materials
Junyi Cao
Evangelos Kalogerakis
AI4CE
36
0
0
17 Apr 2025
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen
Ting Han
Changshe Zhang
Xin Luo
Meiliu Wu
Guorong Cai
Jinhe Su
MDE
32
0
0
17 Apr 2025
Digital Twin Generation from Visual Data: A Survey
Digital Twin Generation from Visual Data: A Survey
Andrew Melnik
Benjamin Alt
Giang Hoang Nguyen
Artur Wilkowski
Maciej Stefańczyk
Qirui Wu
Sinan Harms
Helge Rhodin
Manolis Savva
Michael Beetz
3DGS
VGen
41
0
0
17 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue
Yulin Wang
Haojun Jiang
Pan Liu
S. Song
Gao Huang
VGen
27
0
0
17 Apr 2025
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework
Jiale Tao
Yanbing Zhang
Qixun Wang
Yiji Cheng
Haofan Wang
...
Ruihuang Li
Linqing Wang
Chunyu Wang
Qin Lin
Qinglin Lu
DiffM
47
1
0
16 Apr 2025
SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians
SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld
Zhe Chen
Davide Davoli
Jiapeng Tang
Saimon Terazawa
Ko Nishino
Matthias Nießner
3DH
3DGS
53
0
0
16 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
25
0
0
16 Apr 2025
Adapting a World Model for Trajectory Following in a 3D Game
Adapting a World Model for Trajectory Following in a 3D Game
Marko Tot
Shu Ishida
Abdelhak Lemkhenter
David Bignell
Pallavi Choudhury
...
Tarun Gupta
Darren Gehring
Sam Devlin
Sergio Valcarcel Macua
Raluca Georgescu
38
0
0
16 Apr 2025
Search is All You Need for Few-shot Anomaly Detection
Search is All You Need for Few-shot Anomaly Detection
Qishan Wang
Jia Guo
Shuyong Gao
H. Wang
Li Xiong
J. Hu
Hanqi Guo
Wenqiang Zhang
53
0
0
16 Apr 2025
Learning What NOT to Count
Learning What NOT to Count
Adriano DÁlessandro
Ali Mahdavi-Amiri
Ghassan Hamarneh
27
0
0
16 Apr 2025
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
Zihui Zhang
Yafei Yang
Hongtao Wen
Bo Yang
3DPC
30
0
0
16 Apr 2025
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
Mengshi Qi
Pengfei Zhu
X. Li
Xiaoyang Bi
Lu Qi
Huadong Ma
Ming Yang
VOS
VLM
42
0
0
16 Apr 2025
MixSignGraph: A Sign Sequence is Worth Mixed Graphs of Nodes
MixSignGraph: A Sign Sequence is Worth Mixed Graphs of Nodes
Shiwei Gan
Yafeng Yin
Zhiwei Jiang
Hongkai Wen
Lei Xie
Sanglu Lu
SLR
37
0
0
16 Apr 2025
Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image
Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image
Tao Wen
J. Wang
Y. Chen
Shugong Xu
Chi Zhang
Xuelong Li
MDE
31
0
0
16 Apr 2025
ADT: Tuning Diffusion Models with Adversarial Supervision
ADT: Tuning Diffusion Models with Adversarial Supervision
Dazhong Shen
Guanglu Song
Y. Zhang
Bingqi Ma
Lujundong Li
D. Jiang
Zhuofan Zong
Y. Liu
DiffM
40
0
0
15 Apr 2025
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections
Alireza Salehi
Mohammadreza Salehi
Reshad Hosseini
Cees G. M. Snoek
Makoto Yamada
Mohammad Sabokrou
VLM
26
0
0
15 Apr 2025
Elucidating the Design Space of Multimodal Protein Language Models
Elucidating the Design Space of Multimodal Protein Language Models
Cheng-Yen Hsieh
X. Wang
Daiheng Zhang
Dongyu Xue
Fei Ye
Shujian Huang
Zaixiang Zheng
Quanquan Gu
29
1
0
15 Apr 2025
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang
Haitao Lin
Tianyu Wang
Yanwei Fu
Xiangyang Xue
Y. X. Zhu
3DPC
34
0
0
15 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
K. Enevoldsen
Niklas Muennighoff
VLM
35
0
0
14 Apr 2025
MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model
MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model
Jian Liu
Wei Sun
Hui Yang
Jin Zheng
Zichen Geng
Hossein Rahmani
Ajmal Saeed Mian
DiffM
33
0
0
14 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
55
0
0
14 Apr 2025
Semantic Depth Matters: Explaining Errors of Deep Vision Networks through Perceived Class Similarities
Semantic Depth Matters: Explaining Errors of Deep Vision Networks through Perceived Class Similarities
Katarzyna Filus
Michał Romaszewski
Mateusz Żarski
26
0
0
14 Apr 2025
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation
Si-Tong Wei
Rui-Huan Wang
Chuan-Zhi Zhou
Baoquan Chen
Peng-Shuai Wang
29
1
0
14 Apr 2025
An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings
An Image is Worth KKK Topics: A Visual Structural Topic Model with Pretrained Image Embeddings
Matías Piqueras
Alexandra Segerberg
Matteo Magnani
Måns Magnusson
Nataša Sladoje
33
0
0
14 Apr 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li
Xingxuan Zhang
Hao Zou
Yige Guo
Renzhe Xu
Yilong Liu
Chuzhao Zhu
Yue He
Peng Cui
VLM
37
0
0
14 Apr 2025
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition
Changwei Wang
Shunpeng Chen
Yukun Song
Rongtao Xu
Zherui Zhang
...
Shide Du
Zhiwei Xu
Longxiang Gao
Li Guo
Shibiao Xu
19
0
0
14 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
34
0
0
14 Apr 2025
Efficient Generative Model Training via Embedded Representation Warmup
Efficient Generative Model Training via Embedded Representation Warmup
Deyuan Liu
Peng Sun
Xufeng Li
Tao Lin
19
0
0
14 Apr 2025
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Xingjian Leng
Jaskirat Singh
Yunzhong Hou
Zhenchang Xing
Saining Xie
Liang Zheng
34
0
0
14 Apr 2025
ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting
ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting
Huiqi Wu
Jianbo Mei
Yingjie Huang
Yining Xu
Jingjiao You
Yilong Liu
Li Yao
3DGS
27
0
0
14 Apr 2025
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang
Arjun Chandra
Aoming Liu
Venkatesh Saligrama
Boqing Gong
MLLM
VLM
45
0
0
13 Apr 2025
CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models
CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models
P. Guhan
D. Kothandaraman
Tsung-Wei Huang
Guan-Ming Su
Dinesh Manocha
DiffM
VGen
34
0
0
13 Apr 2025
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
Zheyuan Zhang
Monica Dou
Linkai Peng
Hongyi Pan
Ulas Bagci
Boqing Gong
VLM
56
0
0
12 Apr 2025
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
Qingyuan Wang
Rui Song
Jiaojiao Li
Kerui Cheng
David Ferstl
Yinlin Hu
3DPC
43
0
0
12 Apr 2025
crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels
crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels
M. Herde
Lukas Lührs
Denis Huseljic
Bernhard Sick
22
0
0
12 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
37
0
0
12 Apr 2025
MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation
MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation
Changhao Li
Yu Xin
Xiaowei Zhou
Ariel Shamir
Hao Zhang
Ligang Liu
R. Hu
48
0
0
12 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
72
0
0
11 Apr 2025
Boosting multi-demographic federated learning for chest x-ray analysis using general-purpose self-supervised representations
Boosting multi-demographic federated learning for chest x-ray analysis using general-purpose self-supervised representations
Mahshad Lotfinia
Arash Tayebiarasteh
Samaneh Samiei
Mehdi Joodaki
Soroosh Tayebi Arasteh
23
0
0
11 Apr 2025
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
Jonathan Prexl
M. Recla
M. Schmitt
29
0
0
11 Apr 2025
Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models
Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models
Jiahuan Long
Tingsong Jiang
Wen Yao
Yizhe Xiong
Zhengqin Xu
Shuai Jia
Chao Ma
19
0
0
11 Apr 2025
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
Qinghongbing Xie
Zijian Liang
Long Zeng
29
0
0
11 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
29
0
0
11 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Diffusion Models for Robotic Manipulation: A Survey
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
51
1
0
11 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
ViT
27
0
0
11 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
35
0
0
10 Apr 2025
Previous
123456...424344
Next