ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,336 papers shown
Title
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
69
1
0
22 Jan 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
35
0
0
22 Jan 2025
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu
T. Scargill
M. Gorlatova
70
2
0
22 Jan 2025
ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
Shiyue Zhang
Zheng Chong
Xi Lu
Wenqing Zhang
Haoxiang Li
Xujie Zhang
Jiehui Huang
Xiao Dong
Xiaodan Liang
DiffM
40
0
0
21 Jan 2025
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Zhenhailong Wang
Haiyang Xu
Junyang Wang
Xi Zhang
Ming Yan
J. Zhang
Fei Huang
Heng Ji
43
9
0
20 Jan 2025
Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks
Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks
Michael Schwingshackl
Fabio Francisco Oberweger
Markus Murschitz
44
1
0
20 Jan 2025
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Ruixuan Zhang
Beichen Wang
Juexiao Zhang
Zilin Bian
Chen Feng
K. Ozbay
39
2
0
17 Jan 2025
Enhancing Novel Object Detection via Cooperative Foundational Models
Enhancing Novel Object Detection via Cooperative Foundational Models
Rohit K Bharadwaj
Muzammal Naseer
Salman Khan
F. Khan
ObjD
VLM
133
1
0
17 Jan 2025
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM
Xin Hu
Janet Wang
Jihun Hamm
R. Yotsu
Zhengming Ding
92
0
0
17 Jan 2025
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
Divyansh Srivastava
Beatriz Cabrero-Daniel
Christian Berger
VLM
57
8
0
17 Jan 2025
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Abdalwhab Abdalwhab
A. Imran
Sina Heydarian
I. Iordanova
David St-Onge
41
0
0
16 Jan 2025
SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing
SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing
Varun Biyyala
Bharat Chanderprakash Kathuria
Jialu Li
Youshan Zhang
50
0
0
13 Jan 2025
Guided SAM: Label-Efficient Part Segmentation
Guided SAM: Label-Efficient Part Segmentation
S.B. van Rooij
G.J. Burghouts
VLM
38
0
0
13 Jan 2025
Toward Realistic Camouflaged Object Detection: Benchmarks and Method
Toward Realistic Camouflaged Object Detection: Benchmarks and Method
Zhimeng Xin
Tianxu Wu
Shiming Chen
Shuo Ye
Zijing Xie
Yixiong Zou
Xinge You
Yufei Guo
31
0
0
13 Jan 2025
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Juntao Ren
Priya Sundaresan
Dorsa Sadigh
Sanjiban Choudhury
Jeannette Bohg
37
14
0
13 Jan 2025
Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation
Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation
Zhenyang Feng
Zihe Wang
Saul Ibaven Bueno
Tomasz Frelek
Advikaa Ramesh
...
Hilmar Lapp
Charles V. Stewart
T. Berger-Wolf
Yu-Chuan Su
Wei-Lun Chao
46
0
0
12 Jan 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
42
1
0
12 Jan 2025
Multi-subject Open-set Personalization in Video Generation
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming Yang
Sergey Tulyakov
DiffM
VGen
69
7
0
10 Jan 2025
Supervision-free Vision-Language Alignment
Supervision-free Vision-Language Alignment
Giorgio Giannone
Ruoteng Li
Qianli Feng
Evgeny Perevodchikov
Rui Chen
Aleix M. Martinez
VLM
58
0
0
08 Jan 2025
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
Yuzhou Huang
Ziyang Yuan
Quande Liu
Qiulin Wang
Xintao Wang
Ruimao Zhang
Pengfei Wan
Di Zhang
Kun Gai
VGen
DiffM
35
10
0
08 Jan 2025
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Mingjie Pan
Jiyao Zhang
Tianshu Wu
Yinghao Zhao
Wenlong Gao
Hao Dong
LM&Ro
47
6
0
08 Jan 2025
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization
Kourosh Darvish
Marta Skreta
Yuchi Zhao
Naruki Yoshikawa
Sagnik Som
...
Han Hao
Haoping Xu
Alán Aspuru-Guzik
Animesh Garg
Florian Shkurti
57
21
0
08 Jan 2025
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park
Sebin Kim
Taehong Moon
Minkyu Kim
Kangwook Lee
Jaewoong Cho
DiffM
CoGe
62
2
0
08 Jan 2025
Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting
Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting
K. Gao
Liangzhi Li
Hongjie He
Dening Lu
Linlin Xu
Jonathan Li
GP
3DGS
32
2
0
08 Jan 2025
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Yiliang Chen
Steven SC Ho
Cheng Xu
Yao Jie Xie
Wing-Fai Yeung
Shengfeng He
Jing Qin
LM&MA
28
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
86
11
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
94
48
0
03 Jan 2025
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Yifan Zhang
Junhui Hou
64
1
0
03 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
71
3
0
03 Jan 2025
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Haoyu Zheng
Wenqiao Zhang
Zheqi Lv
Yu Zhong
Yang Dai
...
Yongliang Shen
Juncheng Billy Li
Dongping Zhang
Siliang Tang
Yueting Zhuang
DiffM
VGen
48
0
0
31 Dec 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
70
5
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
59
17
0
31 Dec 2024
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
H. Zhang
Tat-Seng Chua
Shuicheng Yan
59
37
0
31 Dec 2024
VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
Zhipeng Chen
Lan Yang
Yonggang Qi
Honggang Zhang
Kaiyue Pang
Ke Li
Yi-Zhe Song
DiffM
88
0
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
46
3
0
31 Dec 2024
YOLO-UniOW: Efficient Universal Open-World Object Detection
YOLO-UniOW: Efficient Universal Open-World Object Detection
Lihao Liu
Juexiao Feng
Hui Chen
Ao Wang
Lin Song
J. Han
Guiguang Ding
ObjD
VLM
33
2
0
31 Dec 2024
AI-Powered Urban Transportation Digital Twin: Methods and Applications
AI-Powered Urban Transportation Digital Twin: Methods and Applications
Xuan Di
Yongjie Fu
Mehmet K.Turkcan
Mahshid Ghasemi
Zhaobin Mo
Chengbo Zang
Abhishek Adhikari
Z. Kostić
Gil Zussman
AI4CE
29
0
0
30 Dec 2024
Enhancing Vision-Language Tracking by Effectively Converting Textual
  Cues into Visual Cues
Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues
X. Feng
D. Zhang
Shuyan Hu
X. Li
M. Wu
Jie Zhang
Xiaojing Chen
K. Huang
40
0
0
27 Dec 2024
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Xiaoyang Liu
Boran Wen
Xinpeng Liu
Zizheng Zhou
Hongwei Fan
Cewu Lu
Lizhuang Ma
Yulong Chen
Y. Li
51
2
0
27 Dec 2024
Visual Prompting with Iterative Refinement for Design Critique
  Generation
Visual Prompting with Iterative Refinement for Design Critique Generation
Peitong Duan
Chin-yi Chen
Bjoern Hartmann
Yang Li
71
0
0
22 Dec 2024
Aria-UI: Visual Grounding for GUI Instructions
Aria-UI: Visual Grounding for GUI Instructions
Yuhao Yang
Yue Wang
Dongxu Li
Ziyang Luo
Bei Chen
C. Huang
Junnan Li
LM&Ro
LLMAG
106
14
0
20 Dec 2024
Towards a Training Free Approach for 3D Scene Editing
Towards a Training Free Approach for 3D Scene Editing
Vivek Madhavaram
Shivangana Rawat
Chaitanya Devaguptapu
Charu Sharma
Manohar Kaul
DiffM
67
0
0
17 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
100
3
0
16 Dec 2024
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng
Yu Han
Xijing Zhang
Tanghui Li
Yanting Zhang
Rui Fan
107
3
0
15 Dec 2024
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu
Hao Fei
Liangming Pan
William Yang Wang
Shuicheng Yan
Tat-Seng Chua
LRM
61
1
0
15 Dec 2024
Just a Few Glances: Open-Set Visual Perception with Image Prompt
  Paradigm
Just a Few Glances: Open-Set Visual Perception with Image Prompt Paradigm
Jinrong Zhang
Penghui Wang
Chunxiao Liu
Wei Liu
D. Jin
Qiong Zhang
Erli Meng
Zhengnan Hu
VLM
75
0
0
14 Dec 2024
BaB-ND: Long-Horizon Motion Planning with Branch-and-Bound and Neural Dynamics
Keyi Shen
Jiangwei Yu
Huan Zhang
Yunzhu Li
Yunzhu Li
73
1
0
12 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
156
0
0
12 Dec 2024
PrEditor3D: Fast and Precise 3D Shape Editing
PrEditor3D: Fast and Precise 3D Shape Editing
Ziya Erkoç
Can Gümeli
Chaoyang Wang
Matthias Nießner
Angela Dai
Peter Wonka
Hsin-Ying Lee
Peiye Zhuang
71
2
0
09 Dec 2024
Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and
  Annotation Framework
Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework
Jiuyi Xu
Meida Chen
Andrew Feng
Yangming Shi
Zifan Yu
57
0
0
09 Dec 2024
Previous
123...678...252627
Next