ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,331 papers shown
Title
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Fernando Cladera
Zachary Ravichandran
Jason Hughes
Varun Murali
Carlos Nieto-Granda
M. Hsieh
George J. Pappas
Camillo J. Taylor
Vijay R. Kumar
11
0
0
14 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
16
0
0
13 May 2025
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Reihaneh Mirjalili
Tobias Jülg
Florian Walter
Wolfram Burgard
22
0
0
13 May 2025
BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy
BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy
Micah Nye
Ayoub Raji
Andrew Saba
Eidan Erlich
Robert Exley
...
Ritesh Misra
Matthew Sivaprakasam
Marko Bertogna
Deva Ramanan
Sebastian A. Scherer
26
0
0
12 May 2025
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
David Cáceres-Domínguez
M. Iannotta
Abhishek Kashyap
Shuo Sun
Yuxuan Yang
...
Zheng Jia
Graziano Carriero
Sofia Lindqvist
Silvio Di Castro
Matteo Iovino
11
0
0
11 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
C. Hong
AI4CE
21
0
0
11 May 2025
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
Xueyang Guo
Hongwei Hu
Chengye Song
J. Chen
Zilin Zhao
Yu Fu
Bowen Guan
Zhenze Liu
16
0
0
11 May 2025
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Jingyao Wang
Jianqi Zhang
Wenwen Qiang
Changwen Zheng
VLM
27
0
0
10 May 2025
Describe Anything in Medical Images
Describe Anything in Medical Images
Xi Xiao
Yunbei Zhang
Thanh-Huy Nguyen
Ba Thinh Lam
Janet Wang
...
Xingjian Li
X. U. Wang
Hao Xu
Tianming Liu
Min Xu
MedIm
VLM
40
0
0
09 May 2025
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization
Zhuang Qi
Sijin Zhou
Lei Meng
Han Hu
Han Yu
Xiangxu Meng
FedML
CML
72
0
0
08 May 2025
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
98
0
0
08 May 2025
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
Weichen Zhang
Chen Gao
Shiquan Yu
Ruiying Peng
Baining Zhao
Qian Zhang
Jinqiang Cui
Xinlei Chen
Y. Li
LLMAG
LM&Ro
40
0
0
08 May 2025
Visual Affordances: Enabling Robots to Understand Object Functionality
Visual Affordances: Enabling Robots to Understand Object Functionality
Tommaso Apicella
Alessio Xompero
Andrea Cavallaro
39
0
0
08 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
26
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
52
0
0
07 May 2025
Corner Cases: How Size and Position of Objects Challenge ImageNet-Trained Models
Corner Cases: How Size and Position of Objects Challenge ImageNet-Trained Models
Mishal Fatima
Steffen Jung
M. Keuper
31
0
0
06 May 2025
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
43
0
0
06 May 2025
6D Pose Estimation on Spoons and Hands
6D Pose Estimation on Spoons and Hands
Kevin Tan
Fan Yang
Y. Chen
40
0
0
05 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
44
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
60
0
0
05 May 2025
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li
Lingyun Xu
M. Zhang
Jiaming Liu
Yan Shen
...
Jiahui Xu
Liang Heng
Siyuan Huang
S. Zhang
Hao Dong
LM&Ro
42
0
0
04 May 2025
LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
Aidan Curtis
Hao Tang
Thiago Veloso
Kevin Ellis
Joshua B. Tenenbaum
Tomás Lozano-Pérez
Leslie Pack Kaelbling
39
0
0
04 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
34
0
0
04 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Robotic Visual Instruction
Robotic Visual Instruction
Y. Li
Ziyang Gong
H. Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
69
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
57
0
0
01 May 2025
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
Pranav Saxena
Nishant Raghuvanshi
Neena Goveas
69
0
0
30 Apr 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
81
0
0
30 Apr 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Z. Wang
Tao Jin
DiffM
103
2
0
30 Apr 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
20
0
0
30 Apr 2025
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
Y. Li
Lu Si
Y. T. Hou
Chengaung Liu
B. Li
Hongjian Fang
J. Zhang
71
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
37
0
0
30 Apr 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Y. Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Y. Yang
Qi Zhang
Lili Qiu
VLM
76
0
0
30 Apr 2025
Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection
Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection
Daniel Bogdoll
Rajanikant Ananta
Abeyankar Giridharan
Isabel Moore
Gregory Stevens
Henry X. Liu
VLM
51
0
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
X. Li
MLLM
71
0
0
29 Apr 2025
If Concept Bottlenecks are the Question, are Foundation Models the Answer?
If Concept Bottlenecks are the Question, are Foundation Models the Answer?
Nicola Debole
Pietro Barbiero
Francesco Giannini
Andrea Passerini
Stefano Teso
Emanuele Marconato
71
0
0
28 Apr 2025
Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification
Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification
Nikolaos Chaidos
Angeliki Dimitriou
Nikolaos Spanos
Athanasios Voulodimos
Giorgos Stamou
25
1
0
28 Apr 2025
Simultaneous Pick and Place Detection by Combining SE(3) Diffusion Models with Differential Kinematics
Simultaneous Pick and Place Detection by Combining SE(3) Diffusion Models with Differential Kinematics
Tianyi Ko
Takuya Ikeda
Koichi Nishiwaki
35
0
0
28 Apr 2025
TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians
TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians
Letian Huang
Dongwei Ye
Jialin Dan
Chengzhi Tao
Huiwen Liu
Kun Zhou
Bo Ren
Y. Li
Yanwen Guo
Jie Guo
37
1
0
26 Apr 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
86
0
0
26 Apr 2025
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting
Yiming Zhao
Guorong Li
Laiyun Qing
Amin Beheshti
Jian Yang
Michael Sheng
Yuankai Qi
Qingming Huang
VLM
VPVLM
70
0
0
24 Apr 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas J. Guibas
Minhyuk Sung
LRM
41
0
0
24 Apr 2025
MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin
MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin
Sausar Karaf
Mikhail Martynov
Oleg Sautenkov
Zhanibek Darush
Dzmitry Tsetserukou
29
1
0
23 Apr 2025
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
VideoVista-CulturalLingo: 360∘^\circ∘ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
Xinyu Chen
Yunxin Li
Haoyuan Shi
Baotian Hu
Wenhan Luo
Yaowei Wang
M. Zhang
ELM
62
0
0
23 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lei Ren
Huixing Jiang
Chen Wei
Xiaojie Wang
Ruifan Li
Fangxiang Feng
DiffM
69
0
0
22 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
X. Wang
Xiangnan He
61
0
0
22 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
29
0
0
22 Apr 2025
DRAWER: Digital Reconstruction and Articulation With Environment Realism
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia
Entong Su
Marius Memmel
Arhan Jain
Raymond Yu
Numfor Mbiziwo-Tiapo
Ali Farhadi
Abhishek Gupta
Shenlong Wang
Wei-Chiu Ma
VGen
28
1
0
21 Apr 2025
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li
Jinglin Xu
Yunzhen Zhao
Yuxin Peng
ObjD
27
0
0
21 Apr 2025
1234...252627
Next