ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,336 papers shown
Title
CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor
  Segmentation
CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation
Zhongzhen Huang
Yankai Jiang
Rongzhao Zhang
Shaoting Zhang
Xiaofan Zhang
MedIm
64
4
0
11 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
48
41
0
11 Jun 2024
UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving
UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving
Daniel Bogdoll
Noël Ollick
Tim Joseph
J. Marius Zöllner
37
1
0
10 Jun 2024
Tuning-Free Visual Customization via View Iterative Self-Attention
  Control
Tuning-Free Visual Customization via View Iterative Self-Attention Control
Xiaojie Li
Chenghao Gu
Shuzhao Xie
Yunpeng Bai
Weixiang Zhang
Zhi Wang
37
0
0
10 Jun 2024
Extending Segment Anything Model into Auditory and Temporal Dimensions
  for Audio-Visual Segmentation
Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
Juhyeong Seon
Woobin Im
Sebin Lee
Jumin Lee
Sung-eui Yoon
34
1
0
10 Jun 2024
Open-Vocabulary Part-Based Grasping
Open-Vocabulary Part-Based Grasping
Tjeard van Oort
Dimity Miller
Will N. Browne
Nicolas Marticorena
Jesse Haviland
Niko Suenderhauf
3DPC
32
2
0
10 Jun 2024
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang
Qifan Zhang
Yu-Wei Chao
Bowen Wen
Xiaohu Guo
Yu Xiang
3DH
53
2
0
10 Jun 2024
InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight
  Information Shaping
InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
Yunchao Zhang
Guandao Yang
Leonidas J. Guibas
Yanchao Yang
3DGS
33
1
0
09 Jun 2024
Utilizing Grounded SAM for self-supervised frugal camouflaged human
  detection
Utilizing Grounded SAM for self-supervised frugal camouflaged human detection
Matthias Pijarowski
Alexander Wolpert
Martin Heckmann
Michael Teutsch
40
1
0
09 Jun 2024
VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface
  Reconstruction
VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction
Hanlin Chen
Fangyin Wei
Chen Li
Tianxin Huang
Yunsong Wang
Gim Hee Lee
3DGS
3DV
34
12
0
09 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
78
12
0
09 Jun 2024
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang
Wenbin He
Xiwei Xuan
Clint Sebastian
Jorge Henrique Piazentin Ono
...
Sima Behpour
T. Doan
Liang Gou
Han-Wei Shen
Liu Ren
VLM
27
5
0
07 Jun 2024
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Xiaobiao Du
Haiyang Sun
Shuyun Wang
Zhuojie Wu
Hongwei Sheng
Jiaying Ying
Ming Lu
Tianqing Zhu
Kun Zhan
Xin Yu
3DPC
30
6
0
07 Jun 2024
3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion
  Expression guided Video Segmentation
3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation
Feiyu Pan
Hao Fang
Xiankai Lu
34
3
0
07 Jun 2024
Coherent Zero-Shot Visual Instruction Generation
Coherent Zero-Shot Visual Instruction Generation
Quynh Phung
Songwei Ge
Jia-Bin Huang
49
2
0
06 Jun 2024
Matching Anything by Segmenting Anything
Matching Anything by Segmenting Anything
Siyuan Li
Lei Ke
Martin Danelljan
Luigi Piccinelli
Mattia Segu
Luc Van Gool
Fisher Yu
VOS
37
22
0
06 Jun 2024
Searching Priors Makes Text-to-Video Synthesis Better
Searching Priors Makes Text-to-Video Synthesis Better
Haoran Cheng
Liang Peng
Linxuan Xia
Yuepeng Hu
Hengjia Li
Qinglin Lu
Xiaofei He
Boxi Wu
VGen
DiffM
36
0
0
05 Jun 2024
Balancing Performance and Efficiency in Zero-shot Robotic Navigation
Balancing Performance and Efficiency in Zero-shot Robotic Navigation
Dmytro Kuzmenko
N. Shvai
LM&Ro
32
0
0
05 Jun 2024
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
Colin Hansen
Simas Glinskis
Ashwin Raju
Micha Kornreich
JinHyeong Park
Jayashri Pawar
Richard Herzog
Li Zhang
Benjamin Odry
MedIm
DiffM
59
3
0
04 Jun 2024
Why Only Text: Empowering Vision-and-Language Navigation with
  Multi-modal Prompts
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Haodong Hong
Sen Wang
Zi Huang
Qi Wu
Jiajun Liu
38
3
0
04 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
F. Khan
VLM
ISeg
80
6
0
04 Jun 2024
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
An-Chieh Cheng
Hongxu Yin
Yang Fu
Qiushan Guo
Ruihan Yang
Jan Kautz
Xiaolong Wang
Sifei Liu
LRM
48
44
0
03 Jun 2024
ELSA: Evaluating Localization of Social Activities in Urban Streets
ELSA: Evaluating Localization of Social Activities in Urban Streets
Maryam Hosseini
Marco Cipriano
Sedigheh Eslami
Daniel Hodczak
Liu Liu
Andres Sevtsuk
Gerard de Melo
41
0
0
03 Jun 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image
  Generation
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng
Xi Lu
Hanhui Li
Khun Loun Zai
Baiqiao Yin
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
VGen
37
10
0
03 Jun 2024
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective
  Navigation via Multi-Agent Collaboration
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang
Haiyang Xu
Haitao Jia
Xi Zhang
Ming Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
LM&Ro
LLMAG
34
46
0
03 Jun 2024
Collaborative Novel Object Discovery and Box-Guided Cross-Modal
  Alignment for Open-Vocabulary 3D Object Detection
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
44
6
0
02 Jun 2024
Artemis: Towards Referential Understanding in Complex Videos
Artemis: Towards Referential Understanding in Complex Videos
Jihao Qiu
Yuan Zhang
Xi Tang
Lingxi Xie
Tianren Ma
Pengyu Yan
David Doermann
Qixiang Ye
Yunjie Tian
VLM
VGen
44
8
0
01 Jun 2024
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Zhi Zheng
Qian Feng
Hang Li
Alois C. Knoll
Jianxiang Feng
54
6
0
01 Jun 2024
Empowering Visual Creativity: A Vision-Language Assistant to Image
  Editing Recommendations
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Tiancheng Shen
Jun Hao Liew
Long Mai
Lu Qi
Jiashi Feng
Jiaya Jia
DiffM
30
1
0
31 May 2024
Vision-Language Meets the Skeleton: Progressively Distillation with
  Cross-Modal Knowledge for 3D Action Representation Learning
Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning
Yang Chen
Tian He
Junfeng Fu
Ling Wang
Jingcai Guo
Hong Cheng
VLM
31
2
0
31 May 2024
On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
Selim Kuzucu
Kemal Oksuz
Jonathan Sadeghi
P. Dokania
39
4
0
30 May 2024
Vision-based Manipulation from Single Human Video with Open-World Object
  Graphs
Vision-based Manipulation from Single Human Video with Open-World Object Graphs
Yifeng Zhu
Arisrei Lim
Peter Stone
Yuke Zhu
27
33
0
30 May 2024
FMARS: Annotating Remote Sensing Images for Disaster Management using
  Foundation Models
FMARS: Annotating Remote Sensing Images for Disaster Management using Foundation Models
Edoardo Arnaudo
Jacopo Lungo Vaschetti
Lorenzo Innocenti
Luca Barco
Davide Lisi
V. Fissore
Claudio Rossi
33
1
0
30 May 2024
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
Fangyi Chen
Han Zhang
Zhantao Yang
Hao Chen
Kai Hu
Marios Savvides
ObjD
VLM
38
5
0
30 May 2024
Creating Language-driven Spatial Variations of Icon Images
Creating Language-driven Spatial Variations of Icon Images
Xianghao Xu
Aditya Ganeshan
K. Willis
Yewen Pu
Daniel E. Ritchie
42
0
0
30 May 2024
Grasp as You Say: Language-guided Dexterous Grasp Generation
Grasp as You Say: Language-guided Dexterous Grasp Generation
Yi-Lin Wei
Jian-Jian Jiang
Chengyi Xing
Xiantuo Tan
Xiao-Ming Wu
Hao Li
M. Cutkosky
Wei-Shi Zheng
51
13
0
29 May 2024
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for
  Text-to-Image Customization
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization
Junjie Shentu
Matthew Watson
Noura Al Moubayed
DiffM
49
0
0
28 May 2024
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and
  Open-World Unknown Objects Supervision
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang
Bin Chen
Bin Kang
Yulin Li
Yichi Chen
Weizhi Xian
Huifeng Chang
VLM
ObjD
36
7
0
28 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
56
4
0
28 May 2024
LLM-Optic: Unveiling the Capabilities of Large Language Models for
  Universal Visual Grounding
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Haoyu Zhao
Wenhang Ge
Ying-cong Chen
ObjD
MLLM
VLM
32
4
0
27 May 2024
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD
  Generalization and Open-Set OOD Detection
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
VLM
34
2
0
26 May 2024
PLUG: Revisiting Amodal Segmentation with Foundation Model and
  Hierarchical Focus
PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus
Zhaochen Liu
Limeng Qiao
Xiangxiang Chu
Tingting Jiang
37
2
0
25 May 2024
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel
  Multimodal LLM
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM
Abdur Rahman
Rajat Chawla
Muskaan Kumar
Arkajit Datta
Adarsh Jha
NS Mukunda
Ishaan Bhola
42
2
0
24 May 2024
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive
  Dataset in Real-World Environments
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments
Yang Zhou
Long Quang
Carlos Nieto-Granda
Giuseppe Loianno
19
2
0
23 May 2024
Tuning-free Universally-Supervised Semantic Segmentation
Tuning-free Universally-Supervised Semantic Segmentation
Xiaobo Yang
Xiaojin Gong
VLM
50
1
0
23 May 2024
Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving
  with Typography
Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography
N. Chung
Sensen Gao
Tuan-Anh Vu
Jie M. Zhang
Aishan Liu
Yun Lin
Jin Song Dong
Qi Guo
AAML
37
9
0
23 May 2024
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Simon Damm
M. Laszkiewicz
Johannes Lederer
Asja Fischer
54
3
0
23 May 2024
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal
  Large Language Models
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
Qiji Zhou
Ruochen Zhou
Zike Hu
Panzhong Lu
Siyang Gao
Yue Zhang
LRM
38
13
0
22 May 2024
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept
  Composition
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding
Canyu Zhao
Wen Wang
Zhen Yang
Zide Liu
Hao Chen
Chunhua Shen
DiffM
46
20
0
22 May 2024
PerSense: Personalized Instance Segmentation in Dense Images
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
62
0
0
22 May 2024
Previous
123...151617...252627
Next