Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,335 papers shown
Title
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLM
KELM
OffRL
24
0
0
08 Apr 2025
On the Importance of Conditioning for Privacy-Preserving Data Augmentation
Julian Lorenz
K. Ludwig
Valentin Haug
Rainer Lienhart
DiffM
36
0
0
08 Apr 2025
MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
Alexey Gavryushin
Xi Wang
Robert J. S. Malate
Chenyu Yang
X. Jia
Shubh Goel
Davide Liconti
René Zurbrugg
Robert K. Katzschmann
Marc Pollefeys
34
0
0
08 Apr 2025
Measuring Déjà vu Memorization Efficiently
Narine Kokhlikyan
Bargav Jayaraman
Florian Bordes
Chuan Guo
Kamalika Chaudhuri
23
1
0
08 Apr 2025
Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images
Wenzhao Tang
Weihang Li
Xiucheng Liang
Olaf Wysocki
Filip Biljecki
Christoph Holst
Boris Jutzi
23
1
0
07 Apr 2025
Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection
Jiancheng Pan
Yanxing Liu
Xiao He
Long Peng
Jiahao Li
Yuze Sun
Xiaomeng Huang
33
0
0
06 Apr 2025
Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
Hamza Riaz
A. Smeaton
32
0
0
05 Apr 2025
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang
Y. Li
Yanhong Zeng
Yuwei Guo
D. Lin
Tianfan Xue
Bo Dai
VGen
24
0
0
05 Apr 2025
Simultaneous Learning of Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
Kotaro Ikeda
Masanori Koyama
Jinzhe Zhang
Kohei Hayashi
Kenji Fukumizu
OT
98
0
0
04 Apr 2025
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation
Van Nguyen Nguyen
Stephen Tyree
Andrew Guo
Mederic Fourmy
Anas Gouda
...
Stan Birchfield
Jiri Matas
Yann Labbé
M. Sundermeyer
Tomás Hodan
3DPC
48
1
0
03 Apr 2025
Deep Reinforcement Learning via Object-Centric Attention
Jannis Blüml
Cedric Derstroff
Bjarne Gregori
Elisabeth Dillies
Quentin Delfosse
Kristian Kersting
OCL
44
0
0
03 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
44
0
0
03 Apr 2025
MinkOcc: Towards real-time label-efficient semantic occupancy prediction
Samuel Sze
Daniele De Martini
Lars Kunze
3DPC
44
0
0
03 Apr 2025
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments
Chenyu Zhang
Daniil Cherniavskii
Andrii Zadaianchuk
Antonios Tragoudaras
Antonios Vozikis
Thijmen Nijdam
Derck W. E. Prinzhorn
Mark Bodracska
N. Sebe
E. Gavves
EGVM
VGen
46
0
0
03 Apr 2025
Multi-party Collaborative Attention Control for Image Customization
Han Yang
Chuanguang Yang
Qiuli Wang
Zhulin An
Weilun Feng
Libo Huang
Y. Xu
DiffM
25
0
0
02 Apr 2025
UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting
Jaehoon Choi
Dongki Jung
Yonghan Lee
Sungmin Eum
Dinesh Manocha
H. Kwon
3DGS
43
0
0
02 Apr 2025
Multimodal Reference Visual Grounding
Yangxiao Lu
Ruosen Li
Liqiang Jing
Jikai Wang
Xinya Du
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
ObjD
76
0
0
02 Apr 2025
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang
Duo Peng
Feng Chen
Y. Yang
Yinjie Lei
DiffM
74
0
0
02 Apr 2025
Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation
Aleksander Plocharski
Jan Swidzinski
Przemyslaw Musialski
DiffM
30
0
0
02 Apr 2025
Global Intervention and Distillation for Federated Out-of-Distribution Generalization
Zhuang Qi
Runhui Zhang
Lei Meng
Wei Wu
Yachong Zhang
X. Meng
FedML
68
1
0
01 Apr 2025
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan
Hong-Xing Yu
Sirui Chen
L. Fei-Fei
Jiajun Wu
VGen
62
1
0
01 Apr 2025
CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward
Zhiqiang Wang
Pengbin Feng
Yanbin Lin
Shuzhang Cai
Zongao Bian
Jinghua Yan
Xingquan Zhu
32
1
0
31 Mar 2025
Consistent Subject Generation via Contrastive Instantiated Concepts
Lee Hsin-Ying
Kelvin Chan
Ming Yang
DiffM
90
0
0
31 Mar 2025
DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model
Ming Yuan
Sichao Wang
Chuang Zhang
Lei He
Qing Xu
Jianqiang Wang
DiffM
MDE
47
0
0
31 Mar 2025
Detecting Glioma, Meningioma, and Pituitary Tumors, and Normal Brain Tissues based on Yolov11 and Yolov8 Deep Learning Models
Ahmed M. Taha
Salah A. Aly
Mohamed F. Darwish
31
0
0
31 Mar 2025
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
Bosung Kim
Kyuhwan Lee
Isu Jeong
Jungmin Cheon
Yeojin Lee
Seulki Lee
VGen
45
1
0
31 Mar 2025
PhysPose: Refining 6D Object Poses with Physical Constraints
Martin Malenický
Martin Cífka
Médéric Fourmy
Louis Montaut
Justin Carpentier
Josef Sivic
Vladimir Petrik
36
0
0
30 Mar 2025
EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing
Hongxiang Jiang
Jihao Yin
Qixiong Wang
Jiaqi Feng
Guo Chen
48
0
0
30 Mar 2025
Object Isolated Attention for Consistent Story Visualization
Xiangyang Luo
Junhao Cheng
Yifan Xie
Xin Zhang
Tao Feng
Z. Liu
Fei Ma
Fei Richard Yu
DiffM
39
1
0
30 Mar 2025
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva
Andrew Zisserman
46
0
0
30 Mar 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
Tianming Liang
Haichao Jiang
Wei-Shi Zheng
Jian-Fang Hu
39
0
0
30 Mar 2025
Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
Jianhua Sun
Jiude Wei
Y. Li
Cewu Lu
LM&Ro
54
1
0
30 Mar 2025
Efficient Adaptation For Remote Sensing Visual Grounding
Hasan Moughnieh
Mohamad Chalhoub
Hasan Nasrallah
Cristiano Nattero
Paolo Campanella
Giovanni Nico
A. Ghandour
46
0
0
29 Mar 2025
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min
Daehyeon Choi
Kyeongmin Yeo
Jihyun Lee
Minhyuk Sung
49
0
0
28 Mar 2025
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving
Fuhao Li
Huan Jin
Bin-Bin Gao
Liaoyuan Fan
Lihui Jiang
Long Zeng
63
0
0
28 Mar 2025
Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges
Ukcheol Shin
Jinsun Park
3DV
MDE
36
0
0
28 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
42
0
0
28 Mar 2025
Cultivating Game Sense for Yourself: Making VLMs Gaming Experts
Wenxuan Lu
Jiangyang He
Zhanqiu Zhang
Yiwen Guo
Tianning Zang
42
0
0
27 Mar 2025
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao
Albert Zhai
Shenlong Wang
VGen
46
1
0
27 Mar 2025
GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Xingyu Peng
Si Liu
Chen Gao
Yan Bai
Beipeng Mu
Xiaofei Wang
Huaxia Xia
62
0
0
26 Mar 2025
Robust Flower Cluster Matching Using The Unscented Transform
Andy Chu
Rashik Shrestha
Yu Gu
Jason N. Gross
59
0
0
26 Mar 2025
VideoGEM: Training-free Action Grounding in Videos
Felix Vogel
Walid Bousselham
Anna Kukleva
Nina Shvetsova
Hilde Kuehne
LM&Ro
VLM
110
0
0
26 Mar 2025
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
Yejin Kwon
Daeun Moon
Youngje Oh
Hyunsoo Yoon
71
0
0
26 Mar 2025
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu
Zijie Xin
Yuhan Fu
Ruixiang Zhao
Bangxiang Lan
Xirong Li
39
0
0
25 Mar 2025
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Hao Guo
Jianfei Zhu
Wei Fan
Chunzhi Yi
Feng Jiang
ObjD
63
0
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
54
1
0
25 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
42
0
0
25 Mar 2025
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Beak
Hyeonwoo Kim
Hanbyul Joo
41
0
0
25 Mar 2025
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay Kulkarni
Ge Yan
Chung-En Sun
Tuomas P. Oikarinen
Tsui-Wei Weng
39
0
0
25 Mar 2025
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang
Jinghao Li
Yu-Wing Tai
DiffM
64
0
0
25 Mar 2025
Previous
1
2
3
4
5
6
...
25
26
27
Next