ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,335 papers shown
Title
Comics for Everyone: Generating Accessible Text Descriptions for Comic
  Strips
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips
Reshma Ramaprasad
6
5
0
01 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
9
3
0
29 Sep 2023
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and
  Light-Weight Modeling
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling
Linghao Yang
Yanmin Wu
Yu Deng
Rui Tian
Xinggang Hu
Tiefeng Ma
11
1
0
29 Sep 2023
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
  Planning
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Yuanyi Zhong
Alihusein Kuwajerwala
Sacha Morin
Krishna Murthy Jatavallabhula
Bipasha Sen
...
Celso Miguel de Melo
Joshua B. Tenenbaum
Antonio Torralba
Florian Shkurti
Liam Paull
LM&Ro
27
166
0
28 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
61
222
0
26 Sep 2023
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided
  Planning
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Han Lin
Abhaysinh Zala
Jaemin Cho
Mohit Bansal
LM&Ro
VGen
DiffM
37
74
0
26 Sep 2023
MoCaE: Mixture of Calibrated Experts Significantly Improves Object
  Detection
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz
Selim Kuzucu
Tom Joy
P. Dokania
MoE
22
5
0
26 Sep 2023
Motion Segmentation from a Moving Monocular Camera
Motion Segmentation from a Moving Monocular Camera
Yuxiang Huang
John S. Zelek
VOS
26
5
0
24 Sep 2023
Detect Everything with Few Examples
Detect Everything with Few Examples
Xinyu Zhang
Yuting Wang
Abdeslam Boularias
ObjD
VLM
21
13
0
22 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
18
20
0
20 Sep 2023
Bridging Zero-shot Object Navigation and Foundation Models through
  Pixel-Guided Navigation Skill
Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill
Wenzhe Cai
Siyuan Huang
Guangran Cheng
Yuxing Long
Peng Gao
Changyin Sun
Hao Dong
LM&Ro
19
41
0
19 Sep 2023
Specification-Driven Video Search via Foundation Models and Formal
  Verification
Specification-Driven Video Search via Foundation Models and Formal Verification
Yunhao Yang
Jean-Raphael Gaglione
Sandeep P. Chinchali
Ufuk Topcu
55
5
0
18 Sep 2023
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and
  Manipulation Tasks
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and Manipulation Tasks
Yuanhong Zeng
Yizhou Zhao
Ying Nian Wu
20
0
0
16 Sep 2023
Efficient Object Rearrangement via Multi-view Fusion
Efficient Object Rearrangement via Multi-view Fusion
Dehao Huang
Chao Tang
Hong Zhang
OCL
12
4
0
16 Sep 2023
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
Zhe Ni
Xiao-Xin Deng
Cong Tai
Xin-Yue Zhu
Qinghongbing Xie
Y. Liu
Xiang Wu
Long Zeng
LM&Ro
19
14
0
14 Sep 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
28
5
0
13 Sep 2023
Knowledge-Guided Short-Context Action Anticipation in Human-Centric
  Videos
Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos
Sarthak Bhagat
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
23
4
0
12 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng-Tao Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
30
115
0
07 Sep 2023
Tracking Anything with Decoupled Video Segmentation
Tracking Anything with Decoupled Video Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOS
VLM
30
121
0
07 Sep 2023
Prompt me a Dataset: An investigation of text-image prompting for
  historical image dataset creation using foundation models
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Hassan el-Hajj
Matteo Valleriani
11
0
0
04 Sep 2023
Big-model Driven Few-shot Continual Learning
Big-model Driven Few-shot Continual Learning
Ziqi Gu
Chunyan Xu
Zihan Lu
Xin Liu
Anbo Dai
Zhen Cui
CLL
22
1
0
02 Sep 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
39
46
0
01 Sep 2023
GREC: Generalized Referring Expression Comprehension
GREC: Generalized Referring Expression Comprehension
Shuting He
Henghui Ding
Chang Liu
Xudong Jiang
ObjD
19
14
0
30 Aug 2023
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model
Tianyu Wang
Yifan Li
Haitao Lin
Xiangyang Xue
Yanwei Fu
LM&Ro
14
8
0
30 Aug 2023
Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based
  Ensemble for Segment Anything Model Estimation
Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based Ensemble for Segment Anything Model Estimation
Hiroaki Yamagiwa
Yusuke Takase
Hiroyuki Kambe
Ryosuke Nakamoto
VLM
18
5
0
26 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large
  Language Models
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Fuwen Luo
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
14
45
0
25 Aug 2023
How to Evaluate the Generalization of Detection? A Benchmark for
  Comprehensive Open-Vocabulary Detection
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
Yi Yao
Peng Liu
Tiancheng Zhao
Qianqian Zhang
Jiajia Liao
Chunxin Fang
Kyusong Lee
Qing Wang
VLM
ObjD
17
12
0
25 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
24
48
0
23 Aug 2023
ASPIRE: Language-Guided Data Augmentation for Improving Robustness
  Against Spurious Correlations
ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations
Sreyan Ghosh
Chandra Kiran Reddy Evuru
Sonal Kumar
Utkarsh Tyagi
Sakshi Singh
Sanjoy Chowdhury
Dinesh Manocha
OOD
20
1
0
19 Aug 2023
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose
  and Size Estimation
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
Jiaqi Yang
Yucong Chen
Xiangting Meng
C. Yan
Ming Li
Ran Chen
Lige Liu
Tao Sun
L. Kneip
42
1
0
17 Aug 2023
A One Stop 3D Target Reconstruction and multilevel Segmentation Method
A One Stop 3D Target Reconstruction and multilevel Segmentation Method
J. Xu
Wei-Ye Zhao
Zhiyan Tang
X. Gan
3DV
11
2
0
14 Aug 2023
Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp
  Segmentation?
Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Risab Biswas
MedIm
33
22
0
12 Aug 2023
Follow Anything: Open-set detection, tracking, and following in
  real-time
Follow Anything: Open-set detection, tracking, and following in real-time
Alaa Maalouf
Ninad Jadhav
Krishna Murthy Jatavallabhula
Makram Chahine
Daniel M.Vogt
Robert J. Wood
Antonio Torralba
Daniela Rus
14
23
0
10 Aug 2023
Pseudo-label Alignment for Semi-supervised Instance Segmentation
Pseudo-label Alignment for Semi-supervised Instance Segmentation
Jie Hu
Cheng Chen
Liujuan Cao
Shengchuan Zhang
Annan Shu
Guannan Jiang
Rongrong Ji
ISeg
28
12
0
10 Aug 2023
Multimodal Pretrained Models for Verifiable Sequential Decision-Making:
  Planning, Grounding, and Perception
Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception
Yunhao Yang
Cyrus Neary
Ufuk Topcu
LM&Ro
OffRL
22
5
0
10 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion
  and Infinite Data Generation
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
19
0
0
08 Aug 2023
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
  Image Manipulation
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Yasheng Sun
Yifan Yang
Houwen Peng
Yifei Shen
Yuqing Yang
Hang-Rui Hu
Lili Qiu
Hideki Koike
DiffM
LM&Ro
27
33
0
02 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
LISA: Reasoning Segmentation via Large Language Model
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
29
391
0
01 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAG
SyDa
34
41
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
117
0
25 Jul 2023
Fashion Matrix: Editing Photos by Just Talking
Fashion Matrix: Editing Photos by Just Talking
Zheng Chong
Xujie Zhang
Fuwei Zhao
Zhenyu Xie
Xiaodan Liang
DiffM
19
2
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible Expressions
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
32
30
0
24 Jul 2023
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing,
  Intralogistics, Maintenance, Repair, and Overhaul
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul
Keno Moenck
Arne Wendt
Philipp Prünte
Julian Koch
Arne Sahrhage
...
Falko Kähler
Dirk Holst
Martin Gomse
Thorsten Schuppstuhl
Daniel Schoepflin
VLM
26
6
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based
  Centerpoint Supervision
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
28
5
0
23 Jul 2023
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
  without Test-time Fine-tuning
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
Jiancang Ma
Junhao Liang
Chen Chen
H. Lu
18
138
0
21 Jul 2023
RepViT: Revisiting Mobile CNN From ViT Perspective
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang
Hui Chen
Zijia Lin
Hengjun Pu
Guiguang Ding
27
173
0
18 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Chaoyang Zhu
Long Chen
ObjD
VLM
24
32
0
18 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
33
106
0
17 Jul 2023
Sim2Plan: Robot Motion Planning via Message Passing between Simulation
  and Reality
Sim2Plan: Robot Motion Planning via Message Passing between Simulation and Reality
Yizhou Zhao
Yuanhong Zeng
Qiang Long
Ying Nian Wu
Song-Chun Zhu
14
0
0
15 Jul 2023
Open Scene Understanding: Grounded Situation Recognition Meets Segment
  Anything for Helping People with Visual Impairments
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
R. Liu
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ke Cao
Yufan Chen
Kailun Yang
Rainer Stiefelhagen
19
15
0
15 Jul 2023
Previous
123...24252627
Next