ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLM
    CLIP
    SSL
ArXivPDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 2,168 papers shown
Title
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
39
0
0
13 May 2025
Synthetic Similarity Search in Automotive Production
Synthetic Similarity Search in Automotive Production
Christoph Huber
Ludwig Schleeh
Dino Knoll
Michael Guthe
19
0
0
12 May 2025
H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
H3^{\mathbf{3}}3DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
X. Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
16
0
0
12 May 2025
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
M. Ronecker
Matthew Foutter
Amine Elhafsi
Daniele Gammelli
Ihor Barakaiev
Marco Pavone
Daniel Watzenig
16
0
0
12 May 2025
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie
Zequn Zeng
Hao Zhang
Yucheng Ding
Y. Wang
Zhengjue Wang
Bo Chen
Hongwei Liu
OT
21
0
0
12 May 2025
Hand-Shadow Poser
Hand-Shadow Poser
Hao Xu
Yinqiao Wang
Niloy J. Mitra
Shuaicheng Liu
Pheng-Ann Heng
Chi-Wing Fu
3DH
24
0
0
11 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
21
0
0
10 May 2025
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu
Keyu Wu
Yao Feng
Youyi Zheng
M. Black
DiffM
3DH
42
0
0
09 May 2025
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Christos Plachouras
Julien Guinot
George Fazekas
Elio Quinton
Emmanouil Benetos
Johan Pauwels
38
1
0
09 May 2025
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
V. Bhat
Yu-Hsiang Lan
P. Krishnamurthy
Ramesh Karri
Farshad Khorrami
43
0
0
09 May 2025
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
Weihong Li
Xiaoqiong Liu
Heng Fan
L. Zhang
16
0
0
09 May 2025
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
Qingwen Bu
Y. Yang
Jisong Cai
Shenyuan Gao
Guanghui Ren
Maoqing Yao
Ping Luo
Hongyang Li
39
0
0
09 May 2025
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Alexander Lappe
M. Giese
19
0
0
09 May 2025
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao
Amy Lin
Jeff Tan
Jason Y. Zhang
Deva Ramanan
Shubham Tulsiani
VGen
44
0
0
08 May 2025
Learning to Drive Anywhere with Model-Based Reannotation
Learning to Drive Anywhere with Model-Based Reannotation
Noriaki Hirose
Lydia Ignatova
Kyle Stachowicz
Catherine Glossop
Sergey Levine
Dhruv Shah
16
0
0
08 May 2025
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
Yonwoo Choi
3DGS
VGen
60
0
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Y. Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen
Yikai Wang
Wenqiang Sun
Feng Wang
Yiwen Chen
Huaping Liu
27
0
0
07 May 2025
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Abdulaziz Almuzairee
Rohan Patil
Dwait Bhatt
Henrik I. Christensen
27
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
50
0
0
07 May 2025
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection
Zhihao Zhang
Abhinav Kumar
Girish Chandar Ganesan
Xiaoming Liu
62
0
0
07 May 2025
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Feng Liu
Nicholas Chimitt
Lanqing guo
Jitesh Jain
Aditya Kane
...
Arun Ross
Humphrey Shi
Zhangyang Wang
A. Jain
Xiaoming Liu
CVBM
22
0
0
07 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
75
0
0
07 May 2025
One2Any: One-Reference 6D Pose Estimation for Any Object
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu
Siyuan Li
Ajad Chhatkuli
Prune Truong
Luc Van Gool
Federico Tombari
37
0
0
07 May 2025
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Nikita Ravi
Abhinav Goel
James C. Davis
George K. Thiruvathukal
35
0
0
06 May 2025
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model
Y.B. Wang
S.Z. Zhou
J.F. Wu
T. Hu
J.N. Zhang
Z. Li
Yanzhe Liu
DiffM
VGen
49
0
0
06 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
32
0
0
06 May 2025
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
Haoyu Bai
Jie Wang
Gaomin Li
X. Li
Xiaohu Zhang
Xia Yang
25
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
62
0
0
06 May 2025
Real-Time Person Image Synthesis Using a Flow Matching Model
Real-Time Person Image Synthesis Using a Flow Matching Model
Jiwoo Jeong
Kirok Kim
Wooju Kim
Nam-Joon Kim
3DH
56
0
0
06 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
42
0
0
05 May 2025
Learning 3D Persistent Embodied World Models
Learning 3D Persistent Embodied World Models
Siyuan Zhou
Yilun Du
Yuncong Yang
Lei Han
Peihao Chen
Dit-Yan Yeung
Chuang Gan
VGen
42
0
0
05 May 2025
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
42
0
0
05 May 2025
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Kuofeng Gao
Yufei Zhu
Yiming Li
Jiawang Bai
Yong-Liang Yang
Z. Li
Shu-Tao Xia
34
0
0
05 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
Utsav Nareti
S. Chattopadhyay
Prolay Mallick
Suraj Kumar
Ayush Vikas Daga
Chandranath Adak
Adarsh Wase
Arjab Roy
10
0
0
05 May 2025
SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting
SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting
Shubhendu Jena
Shishir Reddy Vutukur
A. Boukhayma
3DGS
42
1
0
04 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
40
0
0
04 May 2025
Always Skip Attention
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
49
0
0
04 May 2025
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Ali Mammadov
Loic Le Folgoc
Julien Adam
Anne Buronfosse
Gilles Hayem
Guillaume Hocquet
Pietro Gori
SSL
35
0
0
02 May 2025
Contextures: Representations from Contexts
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
39
0
0
02 May 2025
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Gaozheng Pei
Ke Ma
Yingfei Sun
Qianqian Xu
Q. Huang
DiffM
36
0
0
02 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
49
0
0
02 May 2025
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Simon Giebenhain
Tobias Kirschstein
Martin Rünz
Lourdes Agapito
Matthias Nießner
CVBM
3DH
52
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
53
0
0
30 Apr 2025
Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design
Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design
Vasudev Sharma
Ahmed Alagha
Abdelhakim Khellaf
Vincent Quoc-Huy Trinh
Mahdi S. Hosseini
33
0
0
30 Apr 2025
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer
Olaf Dünkel
Christian Theobalt
Adam Kortylewski
24
0
0
30 Apr 2025
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions
Ziyi Dong
Chengxing Zhou
Weijian Deng
Pengxu Wei
Xiangyang Ji
Liang Lin
MQ
41
0
0
30 Apr 2025
Rethinking Visual Layer Selection in Multimodal LLMs
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
93
0
0
30 Apr 2025
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
Florian Vahl
Jörn Griepenburg
Jan Gutsche
Jasper Güldenstein
Jianwei Zhang
VGen
39
0
0
29 Apr 2025
1234...424344
Next