ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,336 papers shown
Title
BiomedParse: a biomedical foundation model for image parsing of
  everything everywhere all at once
BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once
Theodore Zhao
Yu Gu
Jianwei Yang
Naoto Usuyama
Ho Hin Lee
...
B. Piening
Carlo Bifulco
Mu-Hsin Wei
Hoifung Poon
Sheng Wang
MedIm
36
22
0
21 May 2024
Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
Yi Cheng
Ziwei Xu
Dongyun Lin
Harry Cheng
Yongkang Wong
Ying Sun
Joo Hwee Lim
Mohan S. Kankanhalli
36
0
0
21 May 2024
WorldAfford: Affordance Grounding based on Natural Language Instructions
WorldAfford: Affordance Grounding based on Natural Language Instructions
Changmao Chen
Yuren Cong
Zhen Kan
22
4
0
21 May 2024
URDFormer: A Pipeline for Constructing Articulated Simulation
  Environments from Real-World Images
URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Zoey Chen
Aaron Walsman
Marius Memmel
Kaichun Mo
Alex Fang
Karthikeya Vemuri
Alan Wu
Dieter Fox
Abhishek Gupta
AI4CE
VGen
63
26
0
19 May 2024
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
Ying Jin
Pengyang Ling
Xiao-wen Dong
Pan Zhang
Jiaqi Wang
Dahua Lin
32
2
0
18 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
28
5
0
17 May 2024
Grounded 3D-LLM with Referent Tokens
Grounded 3D-LLM with Referent Tokens
Yilun Chen
Shuai Yang
Haifeng Huang
Tai Wang
Ruiyuan Lyu
Runsen Xu
Dahua Lin
Jiangmiao Pang
50
22
0
16 May 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
...
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
ObjD
VLM
36
34
0
16 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks
  via Multi-modal Large Language Models
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
33
12
0
16 May 2024
VirtualModel: Generating Object-ID-retentive Human-object Interaction
  Image by Diffusion Model for E-commerce Marketing
VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing
Binghui Chen
Chongyang Zhong
Wangmeng Xiang
Yifeng Geng
Xuansong Xie
DiffM
28
6
0
16 May 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge
Yihe Tang
Jiashu Xu
Cem Gokmen
Chengshu Li
...
Miao Liu
Pengchuan Zhang
Ruohan Zhang
Fei-Fei Li
Jiajun Wu
VGen
48
6
0
15 May 2024
Compositional Text-to-Image Generation with Dense Blob Representations
Compositional Text-to-Image Generation with Dense Blob Representations
Weili Nie
Sifei Liu
Morteza Mardani
Chao Liu
Benjamin Eckart
Arash Vahdat
DiffM
80
17
0
14 May 2024
MetaFruit Meets Foundation Models: Leveraging a Comprehensive
  Multi-Fruit Dataset for Advancing Agricultural Foundation Models
MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models
Jiajia Li
Kyle Lammers
Xunyuan Yin
Xiang Yin
Long He
Renfu Lu
Zhaojian Li
27
3
0
14 May 2024
In The Wild Ellipse Parameter Estimation for Circular Dining Plates and
  Bowls
In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls
Akil Pathiranage
Chris Czarnecki
Yuhao Chen
Pengcheng Xi
Linlin Xu
Alexander Wong
15
0
0
12 May 2024
How Much You Ate? Food Portion Estimation on Spoons
How Much You Ate? Food Portion Estimation on Spoons
Aaryam Sharma
Chris Czarnecki
Yuhao Chen
Pengcheng Xi
Linlin Xu
Alexander Wong
13
1
0
12 May 2024
Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation
  Model for Guiding Blind People
Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People
Masaki Kuribayashi
Kohei Uehara
Allan Wang
Daisuke Sato
Simon Chu
Shigeo Morishima
35
1
0
11 May 2024
Training-free Subject-Enhanced Attention Guidance for Compositional
  Text-to-image Generation
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation
Shengyuan Liu
Bo Wang
Ye Ma
Te Yang
Xipeng Cao
Quan Chen
Han Li
Di Dong
Peng Jiang
EGVM
44
2
0
11 May 2024
To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding
To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding
Rohan Banerjee
Rajat Kumar Jenamani
Sidharth Vasudev
Amal Nanavati
Katherine Dimitropoulou
Sarah Dean
T. Bhattacharjee
66
2
0
11 May 2024
Enhancing Weakly Supervised Semantic Segmentation with Multi-modal
  Foundation Models: An End-to-End Approach
Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach
Elham Ravanbakhsh
Cheng Niu
Yongqing Liang
J. Ramanujam
Xin Li
VLM
52
0
0
10 May 2024
Zero-shot Degree of Ill-posedness Estimation for Active Small Object
  Change Detection
Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection
Koji Takeda
Kanji Tanaka
Yoshimasa Nakamura
Asako Kanezaki
58
0
0
10 May 2024
Probing Multimodal LLMs as World Models for Driving
Probing Multimodal LLMs as World Models for Driving
Shiva Sreeram
T. Wang
Alaa Maalouf
Guy Rosman
S. Karaman
Daniela Rus
30
7
0
09 May 2024
A Survey on Occupancy Perception for Autonomous Driving: The Information
  Fusion Perspective
A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective
Huaiyuan Xu
Junliang Chen
Shiyu Meng
Yi Wang
Lap-Pui Chau
3DPC
41
16
0
08 May 2024
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional
  Image Editing
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Yuying Ge
Sijie Zhao
Chen Li
Yixiao Ge
Ying Shan
30
26
0
07 May 2024
Video Diffusion Models: A Survey
Video Diffusion Models: A Survey
Andrew Melnik
Michal Ljubljanac
Cong Lu
Qi Yan
Weiming Ren
Helge J. Ritter
VGen
71
12
0
06 May 2024
Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic
  Labeling using Foundation Models
Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models
Mohamad Al Al Mdfaa
Raghad Salameh
Sergey Zagoruyko
Gonzalo Ferrer
30
0
0
03 May 2024
Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep
  Learning with Geometric Motion Model Fusion
Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion
Yuxiang Huang
Yuhao Chen
John S. Zelek
38
1
0
02 May 2024
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon
  Robotics Tasks
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks
Murtaza Dalal
Tarun Chiruvolu
Devendra Singh Chaplot
Ruslan Salakhutdinov
LM&Ro
34
39
0
02 May 2024
LocInv: Localization-aware Inversion for Text-Guided Image Editing
LocInv: Localization-aware Inversion for Text-Guided Image Editing
Chuanming Tang
Kai Wang
Fei Yang
J. Weijer
DiffM
39
3
0
02 May 2024
Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments
  Using a Synthetic Benchmark
Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark
Juncheng Li
D. Cappelleri
33
2
0
01 May 2024
MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion
  Generation
MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation
Xujie Zhang
Ente Lin
Xiu Li
Yuxuan Luo
Michael C. Kampffmeyer
Xin Dong
Xiaodan Liang
51
10
0
01 May 2024
CultiVerse: Towards Cross-Cultural Understanding for Paintings with
  Large Language Model
CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model
Wei Zhang
Wong Kam-Kwai
Biying Xu
Yiwen Ren
Yuhuai Li
Minfeng Zhu
Yingchaojie Feng
Wei Chen
35
2
0
01 May 2024
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target
  Identification with Large Multimodal Models
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models
Hongzhan Lin
Zixin Chen
Ziyang Luo
Mingfei Cheng
Jing Ma
Guang Chen
31
6
0
01 May 2024
TheaterGen: Character Management with LLM for Consistent Multi-turn
  Image Generation
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng
Baiqiao Yin
Kaixin Cai
Minbin Huang
Hanhui Li
...
Yue Li
Yifei Li
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
MLLM
32
12
0
29 Apr 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A
  Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Fuchun Sun
Bin Fang
AI4CE
LM&Ro
69
13
0
28 Apr 2024
DM-Align: Leveraging the Power of Natural Language Instructions to Make
  Changes to Images
DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images
Maria Mihaela Truşcǎ
Tinne Tuytelaars
Marie-Francine Moens
DiffM
46
1
0
27 Apr 2024
Open-Set 3D Semantic Instance Maps for Vision Language Navigation --
  O3D-SIM
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Laksh Nanwani
Kumaraditya Gupta
Aditya Mathur
Swayam Agrawal
A. H. A. Hafez
K. M. Krishna
32
0
0
27 Apr 2024
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with
  Retrieval Guidelines
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
Xin Jiang
Hao Tang
Rui Yan
Jinhui Tang
Zechao Li
42
3
0
24 Apr 2024
ChEX: Interactive Localization and Region Description in Chest X-rays
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Muller
Georgios Kaissis
Daniel Rueckert
32
5
0
24 Apr 2024
PhyPlan: Generalizable and Rapid Physical Task Planning with Physics
  Informed Skill Networks for Robot Manipulators
PhyPlan: Generalizable and Rapid Physical Task Planning with Physics Informed Skill Networks for Robot Manipulators
Mudit Chopra
Abhinav Barnawal
Harshil Vagadia
Tamajit Banerjee
Shreshth Tuli
Souvik Chakraborty
Rohan Paul
LRM
PINN
40
0
0
22 Apr 2024
A Multimodal Automated Interpretability Agent
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
31
17
0
22 Apr 2024
Clio: Real-time Task-Driven Open-Set 3D Scene Graphs
Clio: Real-time Task-Driven Open-Set 3D Scene Graphs
Dominic Maggio
Yun Chang
Nathan Hughes
Matthew Trang
Dan Griffith
Carlyn Dougherty
Eric Cristofalo
Lukas Schmid
Luca Carlone
3DV
38
32
0
21 Apr 2024
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and
  High-Quality Localization
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization
Zhaopeng Gu
Bingke Zhu
Guibo Zhu
Yingying Chen
Hao Li
Ming Tang
Jinqiao Wang
42
15
0
21 Apr 2024
Beyond Pixel-Wise Supervision for Medical Image Segmentation: From
  Traditional Models to Foundation Models
Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models
Yuyan Shi
Jialu Ma
Jin Yang
Shasha Wang
Yichi Zhang
MedIm
VLM
19
2
0
20 Apr 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Zhuofan Zong
Bingqi Ma
Dazhong Shen
Guanglu Song
Hao Shao
Dongzhi Jiang
Hongsheng Li
Yu Liu
MoE
45
40
0
19 Apr 2024
Groma: Localized Visual Tokenization for Grounding Multimodal Large
  Language Models
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma
Yi-Xin Jiang
Jiannan Wu
Zehuan Yuan
Xiaojuan Qi
VLM
ObjD
37
51
0
19 Apr 2024
Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Myrna C. Silva
Mahtab Dahaghin
M. Toso
Alessio Del Bue
3DGS
29
11
0
19 Apr 2024
ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for
  Automated Estimation of Building Lowest Floor Elevation
ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation
Yu-Hsuan Ho
Longxiang Li
Ali Mostafavi
21
2
0
19 Apr 2024
GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I
  Diffusion Models
GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models
Sai Sree Harsha
Ambareesh Revanur
Dhwanit Agarwal
Shradha Agrawal
VGen
DiffM
45
3
0
18 Apr 2024
What does CLIP know about peeling a banana?
What does CLIP know about peeling a banana?
Claudia Cuttano
Gabriele Rosi
Gabriele Trivigno
Giuseppe Averta
29
2
0
18 Apr 2024
Curriculum Point Prompting for Weakly-Supervised Referring Image
  Segmentation
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai
Sibei Yang
34
8
0
18 Apr 2024
Previous
123...161718...252627
Next