ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,336 papers shown
Title
Collaborating Foundation Models for Domain Generalized Semantic
  Segmentation
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim
Subhankar Roy
S. Essid
Vicky Kalogeiton
Stéphane Lathuilière
20
12
0
15 Dec 2023
Focus on Your Instruction: Fine-grained and Multi-instruction Image
  Editing by Attention Modulation
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
Qin Guo
Tianwei Lin
DiffM
18
30
0
15 Dec 2023
MobileSAMv2: Faster Segment Anything to Everything
MobileSAMv2: Faster Segment Anything to Everything
Chaoning Zhang
Dongshen Han
Sheng Zheng
J. Choi
Tae-Ho Kim
Choong Seon Hong
VLM
22
23
0
15 Dec 2023
Towards Transferable Targeted 3D Adversarial Attack in the Physical
  World
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang
Yinpeng Dong
Shouwei Ruan
Xiao Yang
Hang Su
Xingxing Wei
DiffM
21
10
0
15 Dec 2023
GSVA: Generalized Segmentation via Multimodal Large Language Models
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia
Dongchen Han
Yizeng Han
Xuran Pan
Shiji Song
Gao Huang
VLM
23
54
0
15 Dec 2023
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments
Chubin Zhang
Juncheng Yan
Yi Wei
Jiaxin Li
Li Liu
Yansong Tang
Yueqi Duan
Jiwen Lu
3DV
3DPC
25
9
0
14 Dec 2023
Reliability in Semantic Segmentation: Can We Use Synthetic Data?
Reliability in Semantic Segmentation: Can We Use Synthetic Data?
Thibaut Loiseau
Tuan-Hung Vu
Mickaël Chen
Patrick Pérez
Matthieu Cord
UQCV
23
12
0
14 Dec 2023
ReCoRe: Regularized Contrastive Representation Learning of World Model
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P. K. Poudel
Harit Pandya
Stephan Liwicki
Roberto Cipolla
DRL
OffRL
18
6
0
14 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language
  Pre-training
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
14
4
0
14 Dec 2023
Exploration of visual prompt in Grounded pre-trained open-set detection
Exploration of visual prompt in Grounded pre-trained open-set detection
Qibo Chen
Weizhong Jin
Shuchang Li
Mengdi Liu
Li Yu
Jian Jiang
Xiaozheng Wang
VLM
13
0
0
14 Dec 2023
UniTeam: Open Vocabulary Mobile Manipulation Challenge
UniTeam: Open Vocabulary Mobile Manipulation Challenge
Andrew Melnik
Michael Büttner
Leon Harz
Lyon Brown
G. C. Nandi
PS Arjun
Gaurav Kumar Yadav
Rahul Kala
R. Haschke
LM&Ro
30
12
0
14 Dec 2023
Foundation Models in Robotics: Applications, Challenges, and the Future
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&Ro
AI4CE
LRM
VLM
35
137
0
13 Dec 2023
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun
Runjia Li
Philip H. S. Torr
Xiuye Gu
Siyang Li
VLM
CLIP
20
32
0
12 Dec 2023
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object
  Detection
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Hu Zhang
Jianhua Xu
Tao Tang
Haiyang Sun
Xin Yu
Zi Huang
Kaicheng Yu
ObjD
3DPC
33
12
0
12 Dec 2023
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction
  Following
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
Shufan Li
Harkanwar Singh
Aditya Grover
DiffM
16
7
0
11 Dec 2023
Large Scale Foundation Models for Intelligent Manufacturing
  Applications: A Survey
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey
Haotian Zhang
S. D. Semujju
Zhicheng Wang
Xianwei Lv
Kang Xu
...
Jing Wu
Zhuo Long
Wensheng Liang
Xiaoguang Ma
Ruiyan Zhuang
UQCV
AI4TS
AI4CE
27
4
0
11 Dec 2023
OpenSD: Unified Open-Vocabulary Segmentation and Detection
OpenSD: Unified Open-Vocabulary Segmentation and Detection
Shuai Li
Ming-hui Li
Pengfei Wang
Lei Zhang
ObjD
VLM
24
6
0
10 Dec 2023
RepViT-SAM: Towards Real-Time Segmenting Anything
RepViT-SAM: Towards Real-Time Segmenting Anything
Ao Wang
Hui Chen
Zijia Lin
Jungong Han
Guiguang Ding
VLM
20
19
0
10 Dec 2023
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
  Aerial Visual Perception?
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
A. Dutta
Srijan Das
Jacob Nielsen
Rajatsubhra Chakraborty
Mubarak Shah
22
9
0
07 Dec 2023
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chengshu Li
Jacky Liang
Andy Zeng
Xinyun Chen
Karol Hausman
Dorsa Sadigh
Sergey Levine
Fei-Fei Li
Fei Xia
Brian Ichter
LLMAG
LRM
31
70
0
07 Dec 2023
GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific
  Narratives
GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives
Zuyao Chen
Jinlin Wu
Zhen Lei
Zhaoxiang Zhang
Changwen Chen
13
2
0
07 Dec 2023
PartDistill: 3D Shape Part Segmentation by Vision-Language Model
  Distillation
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam
Cheng-Kun Yang
Min-Hung Chen
Jen-Hui Chuang
Yen-Yu Lin
24
11
0
07 Dec 2023
AVID: Any-Length Video Inpainting with Diffusion Model
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang
Bichen Wu
Xiaoyan Wang
Yaqiao Luo
Luxin Zhang
Yinan Zhao
Peter Vajda
Dimitris N. Metaxas
Licheng Yu
VGen
DiffM
34
33
0
06 Dec 2023
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang
Zhizhou Sha
Zheng Ding
Yilin Wang
Zhuowen Tu
DiffM
27
21
0
06 Dec 2023
Foundation Model Assisted Weakly Supervised Semantic Segmentation
Foundation Model Assisted Weakly Supervised Semantic Segmentation
Xiaobo Yang
Xiaojin Gong
VLM
26
22
0
06 Dec 2023
FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation
FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation
Olivia Markham
Yuhao Chen
Chi-en Amy Tai
Alexander Wong
13
3
0
06 Dec 2023
Make-A-Storyboard: A General Framework for Storyboard with Disentangled
  and Merged Control
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control
Sitong Su
Litao Guo
Lianli Gao
Hengtao Shen
Jingkuan Song
DiffM
35
3
0
06 Dec 2023
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
Naoki Yokoyama
Sehoon Ha
Dhruv Batra
Jiuguang Wang
Bernadette Bucher
LM&Ro
21
77
0
06 Dec 2023
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing
Shao-Yu Chang
Hwann-Tzong Chen
Tyng-Luh Liu
DiffM
VGen
31
3
0
05 Dec 2023
AI-SAM: Automatic and Interactive Segment Anything Model
AI-SAM: Automatic and Interactive Segment Anything Model
Yimu Pan
Sitao Zhang
Alison D. Gernand
Jeffery A. Goldstein
J. Z. Wang
VLM
32
4
0
05 Dec 2023
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon
Yonatan Bitton
Yonatan Shafir
Roopal Garg
Xi Chen
Dani Lischinski
Daniel Cohen-Or
Idan Szpektor
35
11
0
05 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
87
68
0
05 Dec 2023
Stable Diffusion Exposed: Gender Bias from Prompt to Image
Stable Diffusion Exposed: Gender Bias from Prompt to Image
Yankun Wu
Yuta Nakashima
Noa Garcia
23
16
0
05 Dec 2023
Lenna: Language Enhanced Reasoning Detection Assistant
Lenna: Language Enhanced Reasoning Detection Assistant
Fei Wei
Xinyu Zhang
Ailing Zhang
Bo-Wen Zhang
Xiangxiang Chu
MLLM
LRM
22
23
0
05 Dec 2023
Human Demonstrations are Generalizable Knowledge for Robots
Human Demonstrations are Generalizable Knowledge for Robots
Te Cui
Guangyan Chen
Tianxing Zhou
Zicai Peng
Mengxiao Hu
Haoyang Lu
Haizhou Li
Meiling Wang
Yi Yang
Yufeng Yue
LM&Ro
27
6
0
05 Dec 2023
Working Backwards: Learning to Place by Picking
Working Backwards: Learning to Place by Picking
Oliver Limoyo
Abhisek Konar
Trevor Ablett
Jonathan Kelly
F. Hogan
Gregory Dudek
16
0
0
04 Dec 2023
Aligning and Prompting Everything All at Once for Universal Visual
  Perception
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen
Chaoyou Fu
Peixian Chen
Mengdan Zhang
Ke Li
Xing Sun
Yunsheng Wu
Shaohui Lin
Rongrong Ji
VLM
ObjD
46
33
0
04 Dec 2023
Instance-guided Cartoon Editing with a Large-scale Dataset
Instance-guided Cartoon Editing with a Large-scale Dataset
Jian Lin
Chengze Li
Xueting Liu
Zhongping Ge
26
0
0
04 Dec 2023
Unveiling Objects with SOLA: An Annotation-Free Image Search on the
  Object Level for Automotive Data Sets
Unveiling Objects with SOLA: An Annotation-Free Image Search on the Object Level for Automotive Data Sets
Philipp Rigoll
Jacob Langner
Eric Sax
31
3
0
04 Dec 2023
Collaborative Neural Painting
Collaborative Neural Painting
Nicola Dall’Asen
Willi Menapace
E. Peruzzo
E. Sangineto
Yiming Wang
Elisa Ricci
24
0
0
04 Dec 2023
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
  Local-Global Iterative Training
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He
Shaofei Huang
Xuecheng Nie
Tianrui Hui
Luoqi Liu
Jiao Dai
Jizhong Han
Guanbin Li
Si Liu
DiffM
22
7
0
04 Dec 2023
Universal Segmentation at Arbitrary Granularity with Language
  Instruction
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu
Cairong Zhang
Yitong Wang
Jiahao Wang
Yujiu Yang
Yansong Tang
VLM
VOS
47
15
0
04 Dec 2023
Learning Efficient Unsupervised Satellite Image-based Building Damage
  Detection
Learning Efficient Unsupervised Satellite Image-based Building Damage Detection
Yiyun Zhang
Zijian Wang
Yadan Luo
Xin Yu
Zi Huang
18
4
0
04 Dec 2023
SANeRF-HQ: Segment Anything for NeRF in High Quality
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu
Benran Hu
Chi-Keung Tang
Yu-Wing Tai
24
11
0
03 Dec 2023
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models
  into Diffusor
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor
Yiming Zeng
Mingdong Wu
Long Yang
Jiyao Zhang
Hao Ding
Hui Cheng
Hao Dong
DiffM
11
8
0
03 Dec 2023
Looking Inside Out: Anticipating Driver Intent From Videos
Looking Inside Out: Anticipating Driver Intent From Videos
Yung-chi Kung
Arthur Zhang
Jun-ming Wang
Joydeep Biswas
19
1
0
03 Dec 2023
Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting
  Activations to 3D
Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey
Paul Guerrero
Matheus Gadelha
Yannick Hold-Geoffroy
Karan Singh
Niloy Mitra
DiffM
21
32
0
02 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
19
21
0
01 Dec 2023
Grounding Everything: Emerging Localization Properties in
  Vision-Language Transformers
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
37
39
0
01 Dec 2023
VideoBooth: Diffusion-based Video Generation with Image Prompts
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang
Tianxing Wu
Shuai Yang
Chenyang Si
Dahua Lin
Yu Qiao
Chen Change Loy
Ziwei Liu
DiffM
VGen
32
65
0
01 Dec 2023
Previous
123...212223...252627
Next