Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,335 papers shown
Title
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Y. Zhang
MLLM
VLM
47
1
0
01 Mar 2025
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
ObjD
VLM
47
0
0
28 Feb 2025
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian
Zhongliang Guo
Bowen Deng
Chun Tong Lei
Shuai Zhao
Chun Pong Lau
Xiaopeng Hong
Michael P. Pound
DiffM
59
0
0
28 Feb 2025
Technical Report for ReID-SAM on SkiTB Visual Tracking Challenge 2025
Kunjun Li
Cheng-Yen Yang
Hsiang-Wei Huang
Jenq-Neng Hwang
49
0
0
28 Feb 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
J. Liu
Peng Wang
Guoqing Wang
Y. Yang
H. Shen
ObjD
79
0
0
27 Feb 2025
LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding
Ang Cao
Sergio Arnaud
Oleksandr Maksymets
Jianing Yang
Ayush Jain
...
Aravind Rajeswaran
Franziska Meier
Justin Johnson
Jeong Joon Park
Alexander Sax
63
0
0
27 Feb 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min-Ling Zhang
60
0
0
27 Feb 2025
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
Yuhao Li
Mirana Claire Angel
Salman Khan
Yu Zhu
Jinqiu Sun
Yanning Zhang
F. Khan
VGen
46
0
0
27 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
66
0
0
26 Feb 2025
A Survey on Foundation-Model-Based Industrial Defect Detection
Tianle Yang
Luyao Chang
Jiadong Yan
J. Li
Zhi Wang
Ke Zhang
AI4CE
76
2
0
26 Feb 2025
FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks
Tanawan Premsri
Parisa Kordjamshidi
43
1
0
25 Feb 2025
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation
Young Beom Woo
Sun Eung Kim
DiffM
43
0
0
24 Feb 2025
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models
Yaqi Sun
Kyohei Atarashi
Koh Takeuchi
Hisashi Kashima
MLLM
49
0
0
24 Feb 2025
SLABIM: A SLAM-BIM Coupled Dataset in HKUST Main Building
Haoming Huang
Zhijian Qiao
Zehuan Yu
Chuhao Liu
Shaojie Shen
Fumin Zhang
Huan Yin
41
0
0
24 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
67
0
0
24 Feb 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
55
7
0
24 Feb 2025
Anatomical grounding pre-training for medical phrase grounding
Wenjun Zhang
Shakes Chandra
Aaron Nicolson
MedIm
28
0
0
23 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
51
0
0
23 Feb 2025
VaLID: Verification as Late Integration of Detections for LiDAR-Camera Fusion
Vanshika Vats
Marzia Binta Nizam
James Davis
3DPC
60
0
0
21 Feb 2025
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization
Zheyuan Zhang
Runze Li
Tasnim Kabir
Jordan Boyd-Graber
46
0
0
21 Feb 2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
D. She
Mushui Liu
Jingxuan Pang
Jin Wang
Zhen Yang
...
Yi Wang
Qihan Huang
Haobin Tang
Yunlong Yu
Siming Fu
VGen
91
4
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
126
2
0
21 Feb 2025
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
Luzhou Ge
Xiangyu Zhu
Zhuo Yang
Xuesong Li
3DGS
70
0
0
21 Feb 2025
SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models
Peter Carragher
Nikitha Rao
Abhinand Jha
R Raghav
Kathleen M. Carley
VLM
49
0
0
19 Feb 2025
A Survey of Text Classification Under Class Distribution Shift
Adriana Valentina Costache
Silviu Florin Gheorghe
Eduard Poesina
Paul Irofti
Radu Tudor Ionescu
OOD
VLM
60
0
0
18 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
24
2
0
18 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
131
3
0
18 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
107
9
0
18 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
X. Zhang
Jingyu Wang
Wenbing Tao
52
4
0
17 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
48
2
0
17 Feb 2025
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
Yufan Zhou
Haoyu Shen
Huan Wang
DiffM
100
0
0
17 Feb 2025
Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
Weirui Ye
Fangchen Liu
Z. Ding
Yang Gao
Oleh Rybkin
Pieter Abbeel
VGen
OffRL
84
2
0
14 Feb 2025
HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation
Yibo Liu
Zhaodong Jiang
Binbin Xu
Guile Wu
Y. Ren
Tongtong Cao
Bingbing Liu
Rui Heng Yang
Amir Rasouli
J. Shan
38
1
0
14 Feb 2025
Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning
Yuhang Dong
Haizhou Ge
Yupei Zeng
J. Zhang
Beiwen Tian
...
Yufei Jia
Ruixiang Wang
Ran Yi
Guyue Zhou
Longhua Ma
51
0
0
11 Feb 2025
Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior
Lee Hyoseok
Kyeong Seon Kim
Kwon Byung-Ki
Tae-Hyun Oh
MDE
100
0
0
10 Feb 2025
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
3DGS
3DPC
AI4CE
55
1
0
09 Feb 2025
LeAP: Consistent multi-domain 3D labeling using Foundation Models
Simon Gebraad
Andras Palffy
Holger Caesar
111
1
0
06 Feb 2025
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Keyi Zhu
Jiajia Li
Kaixiang Zhang
Chaaran Arunachalam
Siddhartha Bhattacharya
R. Lu
Zhaojian Li
73
0
0
03 Feb 2025
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Joshua R. Waite
Md Zahid Hasan
Qisai Liu
Zhanhong Jiang
Chinmay Hegde
S. Sarkar
OffRL
SyDa
158
1
0
31 Jan 2025
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
A. H. Tan
Angus Fung
Haitong Wang
G. Nejat
87
2
0
31 Jan 2025
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
Luca Ciampi
Ali Azmoudeh
Elif Ecem Akbaba
Erdi Sarıtaş
Ziya Ata Yazıcı
H. K. Ekenel
Giuseppe Amato
Fabrizio Falchi
97
0
0
31 Jan 2025
VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback
Sayeh Gholipour Picha
D. Chanti
A. Caplier
MedIm
53
0
0
29 Jan 2025
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control
Aosong Feng
Weikang Qiu
Jinbin Bai
Xiao Zhang
Zhen Dong
Kaicheng Zhou
Rex Ying
Leandros Tassiulas
DiffM
58
6
0
28 Jan 2025
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
Mai A. Shaaban
Adnan Khan
Mohammad Yaqub
LM&MA
78
2
0
28 Jan 2025
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
Reza Akbarian Bafghi
Carden Bagwell
Avinash Ravichandran
Ashish Shrivastava
M. Raissi
46
0
0
28 Jan 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
85
0
0
25 Jan 2025
PAID: A Framework of Product-Centric Advertising Image Design
Hongyu Chen
Min Zhou
Jing Jiang
Jiale Chen
Yang Lu
Bo Xiao
T. Ge
Bo Zheng
DiffM
VLM
38
0
0
24 Jan 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
37
7
0
23 Jan 2025
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong
Meng Lan
Q. Zhang
L. Zhang
VOS
VGen
65
1
0
23 Jan 2025
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
69
1
0
22 Jan 2025
Previous
1
2
3
...
5
6
7
...
25
26
27
Next