ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,335 papers shown
Title
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
Zhichao Sun
Huazhang Hu
Yidong Ma
Gang Liu
Nemo Chen
Xu Tang
Yao Hu
Yongchao Xu
ObjD
47
0
0
24 Mar 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang
Alexandros Delitzas
Fangjinhua Wang
Ruida Zhang
Xiangyang Ji
Marc Pollefeys
Francis Engelmann
3DV
3DPC
47
4
0
24 Mar 2025
MaSS13K: A Matting-level Semantic Segmentation Benchmark
MaSS13K: A Matting-level Semantic Segmentation Benchmark
C. Xie
Minghan Li
Hui Zeng
Jun Luo
Lei Zhang
VLM
76
0
0
24 Mar 2025
OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning
OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning
Hui Li
Congcong Bian
Zeyang Zhang
Xiaoning Song
Xi Li
Xiao Wu
51
0
0
24 Mar 2025
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
Bin Li
Dehong Gao
Yeyuan Wang
Linbo Jin
Shanqing Yu
Xiaoyan Cai
Libin Yang
VLM
46
0
0
24 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
J. T. Wang
VLM
67
3
0
23 Mar 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Oucheng Huang
Yuhang Ma
Zeng Zhao
Mingrui Wu
Jiayi Ji
Rongsheng Zhang
Z. Hu
Xiaoshuai Sun
Rongrong Ji
41
0
0
22 Mar 2025
Enhancing Martian Terrain Recognition with Deep Constrained Clustering
Enhancing Martian Terrain Recognition with Deep Constrained Clustering
Tejas Panambur
M. Parente
47
0
0
22 Mar 2025
MagicColor: Multi-Instance Sketch Colorization
MagicColor: Multi-Instance Sketch Colorization
Y. Zhang
Yue Ma
Bingyuan Wang
Qifeng Chen
Zeyu Wang
DiffM
65
0
0
21 Mar 2025
Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting
Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting
Simona Kocour
Assia Benbihi
Aikaterini Adam
Torsten Sattler
3DPC
41
0
0
21 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
39
2
0
20 Mar 2025
Single Image Iterative Subject-driven Generation and Editing
Single Image Iterative Subject-driven Generation and Editing
Yair Shpitzer
Gal Chechik
Idan Schwartz
48
0
0
20 Mar 2025
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Ying Liu
Yijing Hua
Haojiang Chai
Yanbo Wang
TengQi Ye
ObjD
54
0
0
19 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li
Mi Zhang
Yiming Sun
Min Yang
DiffM
51
1
0
19 Mar 2025
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees
Faseeh Ahmad
Hashim Ismail
Jonathan Styrud
Maj Stenmark
Volker Krueger
36
0
0
19 Mar 2025
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
Sungjae Lee
Yeonjoo Hong
Kwang In KIm
44
0
0
19 Mar 2025
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
Saad Lahlali
Sandra Kara
Hejer Ammar
Florian Chabot
Nicolas Granger
Hervé Le Borgne
Q. C. Pham
3DPC
55
0
0
19 Mar 2025
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Nvidia
Hassan Abu Alhaija
Jose M. Alvarez
Maciej Bala
Tiffany Cai
...
Yuchong Ye
Xiaodong Yang
X. Yang
Xiaohui Zeng
Yu Zeng
VGen
90
1
0
18 Mar 2025
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Minghan Li
...
Yuxuan Zhou
Jingdong Sun
Qi Dai
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
38
0
0
18 Mar 2025
LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation
LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation
Yang Zhou
Shiyu Zhao
Y. Chen
Z. Wang
Dimitris N. Metaxas
ObjD
56
0
0
18 Mar 2025
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode
Junjia Huang
Pengxiang Yan
Jinhang Cai
Jiyang Liu
Zhao Wang
Yitong Wang
Xinglong Wu
Guanbin Li
DiffM
70
0
0
17 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu-Xi Cheng
VLM
50
1
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuxiang Cai
Yongheng Shang
Jianwei Yin
52
0
0
16 Mar 2025
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility
Yitian Shi
Di Wen
Guanqi Chen
Edgar Welte
Sheng Liu
Kunyu Peng
Rainer Stiefelhagen
Rania Rayyes
63
1
0
16 Mar 2025
Exploring Contextual Attribute Density in Referring Expression Counting
Exploring Contextual Attribute Density in Referring Expression Counting
Zhicheng Wang
Zhiyu Pan
Zhan Peng
Jian Cheng
Liwen Xiao
Wei Jiang
Zhiguo Cao
42
0
0
16 Mar 2025
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
Yu Fang
Yue Yang
Xinghao Zhu
Kaiyuan Zheng
Gedas Bertasius
D. Szafir
Mingyu Ding
47
2
0
15 Mar 2025
SPOC: Spatially-Progressing Object State Change Segmentation in Video
SPOC: Spatially-Progressing Object State Change Segmentation in Video
Priyanka Mandikal
Tushar Nagarajan
Alex Stoken
Zihui Xue
Kristen Grauman
44
0
0
15 Mar 2025
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang
Zhehao Shen
Chengcheng Guo
Yu Hong
Zhuo Su
Y. Zhang
Marc Habermann
Lan Xu
59
1
0
15 Mar 2025
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Arsh Koneru
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VLM
56
1
0
15 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Y. Li
Xinlei Chen
Xiao-Ping Zhang
LRM
63
1
0
14 Mar 2025
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo
Seo Jin Lee
Seungwoo Lee
Seohyung Hong
Hyungseok Seo
Kyungsu Kim
41
0
0
14 Mar 2025
MTV-Inpaint: Multi-Task Long Video Inpainting
Shiyuan Yang
Zheng Gu
Liang Hou
Xin Tao
Pengfei Wan
Xiaodong Chen
Jing Liao
DiffM
58
0
0
14 Mar 2025
EmoAgent: Multi-Agent Collaboration of Plan, Edit, and Critic, for Affective Image Manipulation
Qi Mao
Haobo Hu
Yujie He
Difei Gao
Haokun Chen
Libiao Jin
DiffM
45
0
0
14 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
114
0
0
14 Mar 2025
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Georgy Ponimatkin
Martin Cífka
Tomáš Souček
Médéric Fourmy
Yann Labbé
Vladimir Petrik
Josef Sivic
44
1
0
13 Mar 2025
Attacking Multimodal OS Agents with Malicious Image Patches
Lukas Aichberger
Alasdair Paren
Y. Gal
Philip H. S. Torr
Adel Bibi
AAML
51
2
0
13 Mar 2025
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li
En Yu
Sijia Chen
Wenbing Tao
52
1
0
13 Mar 2025
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng
Xun Guo
Y. Wang
Jacob Zhiyuan Fang
Angtian Wang
Shenghai Yuan
Yiding Yang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
64
0
0
13 Mar 2025
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
Zixian Liu
Mingtong Zhang
Yunzhu Li
49
0
0
13 Mar 2025
IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
Yiyang Ling
Karan Owalekar
Oluwatobiloba Adesanya
Erdem Bıyık
Daniel Seita
44
0
0
13 Mar 2025
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He
Bo Cheng
Yuhang Ma
Qingxiang Jia
Shanyuan Liu
Ao Ma
Xiaoyu Wu
Liebucha Wu
Dawei Leng
Yuhui Yin
DiffM
VLM
47
0
0
13 Mar 2025
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu
Xian Tao
Xinyi Gong
Shichen Qu
Qiyu Chen
Zhengtao Zhang
Xingang Wang
Guiguang Ding
VLM
59
0
0
13 Mar 2025
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Yuhan Wang
Cheng Liu
Daou Zhang
Weichao Wu
41
0
0
13 Mar 2025
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
ObjD
VLM
43
0
0
13 Mar 2025
Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models
Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models
Zhihua Tian
Sirun Nan
Ming Xu
Shengfang Zhai
Wenjie Qu
Jian Liu
Kui Ren
Ruoxi Jia
Jiaheng Zhang
DiffM
85
1
0
12 Mar 2025
Deep Learning for Climate Action: Computer Vision Analysis of Visual Narratives on X
Katharina Prasse
Marcel Kleinmann
Inken Adam
Kerstin Beckersjuergen
Andreas Edte
...
Timotheus Gumpp
Steffen Jung
Isaac Bravo
Stefanie Walter
M. Keuper
52
0
0
12 Mar 2025
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen
Pi Bu
Yingyao Wang
Xinyi Wang
Ziming Wang
...
Qi Zhu
Jun Song
Siran Yang
Jiamang Wang
Bo Zheng
73
2
0
12 Mar 2025
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Xinyu Zhang
Haonan Chang
Yuhan Liu
Abdeslam Boularias
3DGS
39
0
0
12 Mar 2025
Online Language Splatting
Saimouli Katragadda
Cho-Ying Wu
Yuliang Guo
Xinyu Huang
G. Huang
Liu Ren
3DGS
OffRL
60
0
0
12 Mar 2025
Previous
12345...252627
Next