ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,335 papers shown
Title
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Xinyu Zhang
Haonan Chang
Yuhan Liu
Abdeslam Boularias
3DGS
39
0
0
12 Mar 2025
Online Language Splatting
Saimouli Katragadda
Cho-Ying Wu
Yuliang Guo
Xinyu Huang
G. Huang
Liu Ren
3DGS
OffRL
60
0
0
12 Mar 2025
InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images
Jiun Tian Hoe
Weipeng Hu
Wei Zhou
Chao Xie
Ziwei Wang
Chee Seng Chan
Xudong Jiang
Y. Tan
61
0
0
12 Mar 2025
Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding
Tim Steinke
Martin Buchner
Niclas Vodisch
Abhinav Valada
55
0
0
11 Mar 2025
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou
Manli Tao
Chaoyang Zhao
Haiyun Guo
Honghui Dong
Ming Tang
J. T. Wang
46
0
0
11 Mar 2025
DiffEGG: Diffusion-Driven Edge Generation as a Pixel-Annotation-Free Alternative for Instance Annotation
Sanghyun Jo
Ziseok Lee
Wooyeol Lee
Kyungsu Kim
34
0
0
11 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
119
0
0
11 Mar 2025
S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
Guangting Zheng
Jiajun Deng
Xiaomeng Chu
Yu Yuan
Houqiang Li
Yanyong Zhang
3DGS
45
0
0
11 Mar 2025
FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
Yuzhi Lai
Shenghai Yuan
Boya Zhang
Benjamin Kiefer
Peizheng Li
Andreas Zell
36
1
0
11 Mar 2025
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen
Boyang Sun
Anran Zhang
Marc Pollefeys
Stefan Leutenegger
LM&Ro
65
0
0
10 Mar 2025
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang
Zhen Han
Chaojie Mao
J. Zhang
Yulin Pan
Yu Liu
DiffM
VGen
44
5
0
10 Mar 2025
DreamRelation: Relation-Centric Video Customization
Yujie Wei
Shiwei Zhang
Hangjie Yuan
Biao Gong
Longxiang Tang
...
Haonan Qiu
Hengjia Li
Shuai Tan
Y. Zhang
Hongming Shan
VGen
68
1
0
10 Mar 2025
Safety Guardrails for LLM-Enabled Robots
Zachary Ravichandran
Alexander Robey
Vijay R. Kumar
George Pappas
Hamed Hassani
56
0
0
10 Mar 2025
Visual and Text Prompt Segmentation: A Novel Multi-Model Framework for Remote Sensing
Xing Zi
Kairui Jin
Xian Tao
Jun Li
Ali Braytee
Rajiv Ratn Shah
Mukesh Prasad
VLM
62
0
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
52
0
0
10 Mar 2025
Multi-Modal 3D Mesh Reconstruction from Images and Text
Melvin Reka
Tessa Pulli
Markus Vincze
34
0
0
10 Mar 2025
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Jiazheng Liu
Sipeng Zheng
Börje F. Karlsson
Zongqing Lu
32
0
0
10 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
VLM
ObjD
72
1
0
10 Mar 2025
PE3R: Perception-Efficient 3D Reconstruction
Jie Hu
Shizun Wang
Xinchao Wang
61
0
0
10 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
64
3
0
10 Mar 2025
Attention, Please! PixelSHAP Reveals What Vision-Language Models Actually Focus On
Roni Goldshmidt
MLLM
VLM
39
0
0
09 Mar 2025
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
Shijia Zhao
Qiming Xia
Xusheng Guo
Pufan Zou
Maoji Zheng
Hai Wu
Chenglu Wen
Cheng-Yu Wang
3DPC
62
0
0
09 Mar 2025
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow
Evelien Riddell
Yimu Wang
Sean Sedwards
Krzysztof Czarnecki
3DPC
46
0
0
09 Mar 2025
FloPE: Flower Pose Estimation for Precision Pollination
FloPE: Flower Pose Estimation for Precision Pollination
Rashik Shrestha
Madhav Rijal
T. Smith
Yu Gu
39
0
0
08 Mar 2025
Get In Video: Add Anything You Want to the Video
Shaobin Zhuang
Zhipeng Huang
Binxin Yang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Chong Sun
Zheng-Jun Zha
Chen Li
Y. Wang
DiffM
VGen
49
0
0
08 Mar 2025
From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning
Shuangzhi Li
Junlong Shen
Lei Ma
Xingyu Li
3DPC
48
0
0
08 Mar 2025
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang
Yongchao Feng
Shuai Yang
Z. Liu
Qingjie Liu
Y. Wang
ObjD
122
0
0
08 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
85
2
0
08 Mar 2025
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
Yuxuan Bian
Zhaoyang Zhang
Xuan Ju
Mingdeng Cao
Liangbin Xie
Ying Shan
Qiang Xu
VGen
DiffM
68
0
0
07 Mar 2025
Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting
Dominic Maggio
Luca Carlone
100
0
0
07 Mar 2025
RA-DP: Rapid Adaptive Diffusion Policy for Training-Free High-frequency Robotics Replanning
Xi Ye
Rui Heng Yang
Jun Jin
Y. K. Li
Amir Rasouli
49
0
0
06 Mar 2025
GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding
Xihan Wang
Dianyi Yang
Yu Gao
Yufeng Yue
Yi Yang
M. Fu
3DGS
49
0
0
06 Mar 2025
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Lukás Gajdosech
Hassan Ali
Jan-Gerrit Habekost
Martin Madaras
Matthias Kerzel
Stefan Wermter
54
0
0
06 Mar 2025
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
Jialong Xue
Wei Gao
Yu Wang
Chao Ji
Dongdong Zhao
Shi Yan
Shiwu Zhang
43
0
0
06 Mar 2025
GenColor: Generative Color-Concept Association in Visual Design
Yihan Hou
Xingchen Zeng
Yusong Wang
Manling Yang
Xiaojiao Chen
Wei Zeng
DiffM
76
0
0
05 Mar 2025
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
Dujun Nie
Xianda Guo
Yiqun Duan
Ruijun Zhang
Long Chen
LM&Ro
121
2
0
04 Mar 2025
Bridging VLM and KMP: Enabling Fine-grained robotic manipulation via Semantic Keypoints Representation
Junjie Zhu
Huayu Liu
Jin Wang
Bangrong Wen
Kaixiang Huang
Xiaofei Li
Haiyun Zhan
Guodong Lu
60
0
0
04 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
43
0
0
04 Mar 2025
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
Yufei Wang
Ziyu Wang
Mino Nakura
Pratik Bhowal
Chia-Liang Kuo
Yi-Ting Chen
Zackory M. Erickson
David Held
59
0
0
04 Mar 2025
VisAgent: Narrative-Preserving Story Visualization Framework
Seungkwon Kim
GyuTae Park
Sangyeon Kim
Seung-Hun Nam
38
0
0
04 Mar 2025
OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments
Qianwei Wang
Yifan Xu
V. Kamat
Carol Menassa
42
0
0
03 Mar 2025
OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding
Dianyi Yang
Yu Gao
Xihan Wang
Yufeng Yue
Yi Yang
M. Fu
3DGS
64
1
0
03 Mar 2025
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang
Jiazhao Zhang
Yuqing Lan
Yulan Guo
Dezun Dong
Chenyang Zhu
K. Xu
124
0
0
03 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Y. Cao
Haodong Duan
D. Lin
Jiaqi Wang
ObjD
VLM
LRM
70
41
0
03 Mar 2025
Evaluating Stenosis Detection with Grounding DINO, YOLO, and DINO-DETR
Muhammad Musab Ansari
29
0
0
03 Mar 2025
Language-Guided Object Search in Agricultural Environments
Advaith Balaji
Saket Pradhan
Dmitry Berenson
LM&Ro
44
0
0
03 Mar 2025
RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation
Haichao Liu
Sikai Guo
Pengfei Mai
Jiahang Cao
Haoang Li
Jun Ma
39
0
0
03 Mar 2025
MTReD: 3D Reconstruction Dataset for Fly-over Videos of Maritime Domain
Rui Yi Yong
Samuel Picosson
Arnold Wiliem
32
0
0
02 Mar 2025
AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter
Yingbo Tang
S. Zhang
Xiaoshuai Hao
Pengwei Wang
Jianlong Wu
Z. Wang
Shanghang Zhang
63
5
0
02 Mar 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Y. Zhang
MLLM
VLM
47
1
0
01 Mar 2025
Previous
123456...252627
Next