ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10300
  4. Cited By
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
v1v2 (latest)

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

16 May 2024
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
Han Gao
Hongjie Huang
Zhengyu Ma
Xiaoke Jiang
Yihao Chen
Yuda Xiong
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
    ObjDVLM
ArXiv (abs)PDFHTMLHuggingFace (31 upvotes)

Papers citing "Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection"

43 / 43 papers shown
SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding
Keita Otani
Tatsuya Harada
80
0
0
30 Nov 2025
Stable Offline Hand-Eye Calibration for any Robot with Just One Mark
Stable Offline Hand-Eye Calibration for any Robot with Just One Mark
Sicheng Xie
Lingchen Meng
Zhiying Du
Shuyuan Tu
Haidong Cao
Jiaqi Leng
Z. F. Wu
Yu-Gang Jiang
180
0
0
21 Nov 2025
Multi-Step Reasoning for Embodied Question Answering via Tool Augmentation
Multi-Step Reasoning for Embodied Question Answering via Tool Augmentation
Mingliang Zhai
Hansheng Liang
Xiaomeng Fan
Zhi Gao
Chuanhao Li
Che Sun
Xu Bin
Yuwei Wu
Yunde Jia
LRM
193
0
0
23 Oct 2025
Chimera: Compositional Image Generation using Part-based Concepting
Chimera: Compositional Image Generation using Part-based Concepting
Shivam Singh
Yiming Chen
Agneet Chatterjee
Amit Raj
James Hays
Yezhou Yang
Chitta Baral
DiffM
296
0
0
20 Oct 2025
Improved High-probability Convergence Guarantees of Decentralized SGD
Improved High-probability Convergence Guarantees of Decentralized SGD
Aleksandar Armacki
Ali H. Sayed
93
0
0
07 Oct 2025
On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
Chenxiao Yang
Cai Zhou
David Wipf
Zhiyuan Li
DiffM
192
0
0
07 Oct 2025
Inferring Dynamic Physical Properties from Video Foundation Models
Inferring Dynamic Physical Properties from Video Foundation Models
Guanqi Zhan
Xianzheng Ma
Weidi Xie
Andrew Zisserman
VGen
159
2
0
02 Oct 2025
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
Vahid Mirjalili
Ramin Giahi
Sriram Kollipara
Akshay Kekuda
Kehui Yao
...
Kaushiki Nag
Sinduja Subramaniam
Topojoy Biswas
Evren Körpeoglu
Kannan Achan
VLMLRM
95
0
0
26 Sep 2025
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
Pengteng Li
Pinhao Song
Wuyang Li
Weiyu Guo
Huizai Yao
Ziyang Chen
Dugang Liu
Hui Xiong
LRMVLM
126
1
0
19 Sep 2025
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
Zhaoyang Li
Z. Ling
Yuchen Zhou
Litian Gong
Erdem Bıyık
H. Su
212
0
0
19 Sep 2025
Model-Agnostic Open-Set Air-to-Air Visual Object Detection for Reliable UAV Perception
Model-Agnostic Open-Set Air-to-Air Visual Object Detection for Reliable UAV Perception
Spyridon Loukovitis
Anastasios Arsenos
Vasileios Karampinis
Athanasios Voulodimos
88
1
0
11 Sep 2025
Object Detection with Multimodal Large Vision-Language Models: An In-depth Review
Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025
Ranjan Sapkota
Manoj Karkee
ObjDVLM
291
15
0
25 Aug 2025
RynnEC: Bringing MLLMs into Embodied World
RynnEC: Bringing MLLMs into Embodied World
Ronghao Dang
Yuqian Yuan
Yunxuan Mao
Kehan Li
Jiangpin Liu
Zhikai Wang
Xin Li
F. Wang
Deli Zhao
VGenLM&Ro
208
6
0
19 Aug 2025
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Runqi Qiao
Qiuna Tan
Peiqing Yang
Y. Wang
X. Wang
...
Yida Xu
Jie Wang
Chong Sun
Chen Li
Honggang Zhang
OffRLLRM
133
12
0
14 Aug 2025
Designing Object Detection Models for TinyML: Foundations, Comparative Analysis, Challenges, and Emerging Solutions
Designing Object Detection Models for TinyML: Foundations, Comparative Analysis, Challenges, and Emerging SolutionsACM Computing Surveys (ACM Comput. Surv.), 2025
Christophe El Zeinaty
W. Hamidouche
Glenn Herrou
D. Ménard
ObjD
134
0
0
11 Aug 2025
ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow
ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow
Shanshan Guo
Xiwen Liang
Junfan Lin
Yuzheng Zhuang
Guanbin Li
Xiaodan Liang
156
1
0
05 Aug 2025
InspectVLM: Unified in Theory, Unreliable in Practice
InspectVLM: Unified in Theory, Unreliable in Practice
Conor Wallace
Isaac Corley
Jonathan Lwowski
MLLMVLM
112
0
0
03 Aug 2025
Omni-Scan: Creating Visually-Accurate Digital Twin Object Models Using a Bimanual Robot with Handover and Gaussian Splat Merging
Omni-Scan: Creating Visually-Accurate Digital Twin Object Models Using a Bimanual Robot with Handover and Gaussian Splat Merging
Tianshuang Qiu
Zehan Ma
Karim El-Refai
Hiya Shah
Chung Min Kim
Justin Kerr
Ken Goldberg
3DGS
172
2
0
01 Aug 2025
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng
Xiang Zhang
Zirui Wang
Haiyang Xu
Zeyuan Chen
Bingnan Li
Zhuowen Tu
168
6
0
01 Aug 2025
Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction
Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction
Zhensheng Yuan
Haozhi Huang
Zhen Xiong
Di Wang
Guanghua Yang
3DGS
145
2
0
30 Jul 2025
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
Ishani Mondal
Meera Bharadwaj
Ayush Roy
Aparna Garimella
Jordan L. Boyd-Graber
KELM
243
0
0
30 Jul 2025
Spatio-Temporal LLM: Reasoning about Environments and Actions
Spatio-Temporal LLM: Reasoning about Environments and Actions
Haozhen Zheng
Beitong Tian
Mingyuan Wu
Zhenggang Tang
Klara Nahrstedt
Alex Schwing
LRM
204
3
0
07 Jul 2025
NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation
NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation
Max Gandyra
Alessandro Santonicola
Michael Beetz
261
1
0
02 Jul 2025
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
Hongyan Zhi
Peihao Chen
Siyuan Zhou
Yubo Dong
Quanxi Wu
Lei Han
Mingkui Tan
389
13
0
06 Jun 2025
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Chenbin Pan
Wenbin He
Zhengzhong Tu
Liu Ren
LRMVLM
501
2
0
29 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffMVGen
515
37
0
07 May 2025
OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection
OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
338
2
0
06 May 2025
Aligning Anime Video Generation with Human Feedback
Aligning Anime Video Generation with Human Feedback
Bingwen Zhu
Yudong Jiang
Baohan Xu
Siqian Yang
Mingyu Yin
Yidi Wu
Huyang Sun
Zuxuan Wu
EGVMVGen
387
4
0
14 Apr 2025
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Beak
Hyeonwoo Kim
Hanbyul Joo
305
4
0
25 Mar 2025
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
Yong Zhong
Zhuoyi Yang
Jiayan Teng
Xiaohan Zhang
Chongxuan Li
VGen
420
18
0
18 Mar 2025
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
Quang-Trung Truong
Wong Yuk Kwan
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VGen
318
1
0
17 Mar 2025
Embodied Crowd Counting
Embodied Crowd Counting
Runling Long
Yunlong Wang
Jia Wan
Xiang Deng
Xinting Zhu
Weili Guan
Antoni B. Chan
Liqiang Nie
335
0
0
11 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
932
12
0
11 Mar 2025
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
Dennis Rotondi
Fabio Scaparro
Hermann Blum
Kai O. Arras
318
7
0
10 Mar 2025
Consistent Image Layout Editing with Diffusion Models
Tao Xia
Yudi Zhang
Ting Liu Lei Zhang
DiffM
291
1
0
09 Mar 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part SegmentationInternational Conference on 3D Vision (3DV), 2023
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
579
14
0
24 Feb 2025
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
322
10
0
22 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
500
18
0
03 Jan 2025
HandOS: 3D Hand Reconstruction in One Stage
HandOS: 3D Hand Reconstruction in One StageComputer Vision and Pattern Recognition (CVPR), 2024
Xingyu Chen
Zhuheng Song
Xiaoke Jiang
Yaoqing Hu
Junzhi Yu
Lei Zhang
3DHHAI
500
5
0
02 Dec 2024
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
582
16
0
29 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLMLRM
555
22
0
27 Nov 2024
RT-GuIDE: Real-Time Gaussian Splatting for Information-Driven Exploration
RT-GuIDE: Real-Time Gaussian Splatting for Information-Driven ExplorationIEEE Robotics and Automation Letters (RA-L), 2024
Yuezhan Tao
Dexter Ong
Varun Murali
Igor Spasojevic
Pratik Chaudhari
Vijay Kumar
3DGS
439
12
0
26 Sep 2024
OW-Rep: Open World Object Detection with Instance Representation Learning
OW-Rep: Open World Object Detection with Instance Representation Learning
Sunoh Lee
Minsik Jeon
Jihong Min
Junwon Seo
ObjD
1.2K
1
0
24 Sep 2024
1
Page 1 of 1