Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,336 papers shown
Title
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
79
244
0
29 Jan 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
26
103
0
29 Jan 2024
IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models
Xingchen Zeng
Ziyao Gao
Yilin Ye
Wei Zeng
12
12
0
28 Jan 2024
SAM-based instance segmentation models for the automation of structural damage detection
Zehao Ye
Lucy Lovell
A. Faramarzi
Jelena Ninić
16
13
0
27 Jan 2024
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Tianhe Ren
Shilong Liu
Ailing Zeng
Jing Lin
Kunchang Li
...
Feng Li
Jie-jin Yang
Hongyang Li
Qing Jiang
Lei Zhang
VLM
35
378
0
25 Jan 2024
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
M. S. Seyfioglu
Karim Bouyarmane
Suren Kumar
Amir Tavanaei
Ismail B. Tutar
DiffM
28
6
0
24 Jan 2024
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Wei Li
Xue Xu
Jiachen Liu
Xinyan Xiao
20
5
0
24 Jan 2024
ChatterBox: Multi-round Multimodal Referring and Grounding
Yunjie Tian
Tianren Ma
Lingxi Xie
Jihao Qiu
Xi Tang
Yuan Zhang
Jianbin Jiao
Qi Tian
Qixiang Ye
23
14
0
24 Jan 2024
PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation
Zhaozhi Xie
Bochen Guan
Weihao Jiang
Muyang Yi
Yue Ding
Hongtao Lu
Lei Zhang
VLM
36
13
0
23 Jan 2024
Zero-Shot Learning for the Primitives of 3D Affordance in General Objects
Hyeonwoo Kim
Sookwan Han
Patrick Kwon
Hanbyul Joo
DiffM
36
14
0
23 Jan 2024
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
Qinhong Zhou
Sunli Chen
Yisong Wang
Haozhe Xu
Weihua Du
Hongxin Zhang
Yilun Du
Josh Tenenbaum
Chuang Gan
AI4CE
20
13
0
23 Jan 2024
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia
Yang Fu
Sifei Liu
Xiaolong Wang
20
15
0
23 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
22
5
0
23 Jan 2024
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Peiqi Liu
Yaswanth Orru
Jay Vakil
Chris Paxton
Nur Muhammad (Mahi) Shafiullah
Lerrel Pinto
LM&Ro
VLM
95
27
0
22 Jan 2024
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Ci-Siang Lin
Chien-Yi Wang
Yu-Chiang Frank Wang
Min-Hung Chen
VLM
21
0
0
22 Jan 2024
General Flow as Foundation Affordance for Scalable Robot Learning
Chengbo Yuan
Chuan Wen
Tong Zhang
Yang Gao
AI4CE
21
31
0
21 Jan 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
142
706
0
19 Jan 2024
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu
Xiangtai Li
Chenyang Si
Shangchen Zhou
Jingkang Yang
...
Yining Li
Kai Chen
Yunhai Tong
Ziwei Liu
Chen Change Loy
VGen
DiffM
MLLM
24
17
0
18 Jan 2024
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva
Andrew Zisserman
29
13
0
18 Jan 2024
OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality
Aditya Sharma
Luke Yoffe
Tobias Höllerer
25
8
0
17 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
51
35
0
16 Jan 2024
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya
Aniruddha Kembhavi
Dhruv Batra
Z. Kira
Kuo-Hao Zeng
Luca Weihs
VLM
33
4
0
15 Jan 2024
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving
H. Zunair
Md. Shakib Khan
A. Ben Hamza
17
6
0
14 Jan 2024
AffordanceLLM: Grounding Affordance from Vision Language Models
Shengyi Qian
Weifeng Chen
Min Bai
Xiong Zhou
Zhuowen Tu
Li Erran Li
15
20
0
12 Jan 2024
PartSTAD: 2D-to-3D Part Segmentation Task Adaptation
Hyunjin Kim
Minhyuk Sung
43
8
0
11 Jan 2024
Wasserstein Distance-based Expansion of Low-Density Latent Regions for Unknown Class Detection
Prakash Mallick
Feras Dayoub
Jamie Sherrah
13
1
0
10 Jan 2024
RePLan: Robotic Replanning with Perception and Language Models
Marta Skreta
Zihan Zhou
Jia Lin Yuan
Kourosh Darvish
Alán Aspuru-Guzik
Animesh Garg
LM&Ro
LRM
35
26
0
08 Jan 2024
ExTraCT -- Explainable Trajectory Corrections from language inputs using Textual description of features
J-Anne Yow
N. P. Garg
Manoj Ramanathan
Wei Tech Ang
26
5
0
08 Jan 2024
The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline
Haonan Wang
Qianli Shen
Yao Tong
Yang Zhang
Kenji Kawaguchi
37
22
0
07 Jan 2024
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
43
8
0
06 Jan 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan
Xiangtai Li
Chong Zhou
Yining Li
Kai Chen
Chen Change Loy
VLM
29
51
0
05 Jan 2024
Multimodal Data Curation via Object Detection and Filter Ensembles
Tzu-Heng Huang
Changho Shin
Sui Jiet Tay
Dyah Adila
Frederic Sala
34
5
0
05 Jan 2024
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
Pengying Wu
Yao Mu
Bingxian Wu
Yi Hou
Ji Ma
Shanghang Zhang
Chang-rui Liu
LM&Ro
22
24
0
05 Jan 2024
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Muzammal Naseer
Luc Van Gool
F. Tombari
VLM
VPVLM
28
19
0
04 Jan 2024
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
Xiangyu Zhao
Yicheng Chen
Shilin Xu
Xiangtai Li
Xinjiang Wang
Yining Li
Haian Huang
ObjD
AI4CE
37
29
0
04 Jan 2024
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Yunzhi Yan
Haotong Lin
Chenxu Zhou
Weijie Wang
Haiyang Sun
Kun Zhan
Xianpeng Lang
Xiaowei Zhou
Sida Peng
3DGS
65
56
0
02 Jan 2024
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
18
12
0
31 Dec 2023
Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models
Han Jiang
Haosen Sun
Ruoxuan Li
Chi-Keung Tang
Yu-Wing Tai
DiffM
42
0
0
30 Dec 2023
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao
Long Zhao
Vijay Kumar B.G
Yumin Suh
Dimitris N. Metaxas
Manmohan Chandraker
S. Schulter
ObjD
VLM
32
5
0
29 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo-Wen Zhang
Xiaolin Wei
Chunhua Shen
MLLM
31
33
0
28 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi-Xin Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
29
17
0
25 Dec 2023
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu
Lingzhi Zhang
Jianbo Shi
DiffM
42
16
0
24 Dec 2023
Learning Multi-Step Manipulation Tasks from A Single Human Demonstration
Dingkun Guo
30
5
0
23 Dec 2023
Revisiting Few-Shot Object Detection with Vision-Language Models
Anish Madan
Neehar Peri
Shu Kong
Deva Ramanan
VLM
24
6
0
22 Dec 2023
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection
Dongmei Zhang
Chang Li
Ray Zhang
Shenghao Xie
Wei Xue
Xiaodong Xie
Shanghang Zhang
VLM
25
14
0
22 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
60
122
0
21 Dec 2023
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao
Ziqi Zhou
Wenxuan Li
Shiyi Lan
Jieru Mei
Zhiding Yu
Alan L. Yuille
Yuyin Zhou
Cihang Xie
VLM
19
1
0
21 Dec 2023
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
Difei Gao
Lei Ji
Zechen Bai
Mingyu Ouyang
Peiran Li
...
Peiyi Wang
Xiangwu Guo
Hengxu Wang
Luowei Zhou
Mike Zheng Shou
LLMAG
23
21
0
20 Dec 2023
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis
Judith E. Fan
Yulia Gryaditskaya
VLM
3DV
18
1
0
18 Dec 2023
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
P. Nguyen
T.D. Ngo
E. Kalogerakis
Chuang Gan
Anh Tran
Cuong Pham
Khoi Duc Minh Nguyen
ISeg
23
51
0
17 Dec 2023
Previous
1
2
3
...
20
21
22
...
25
26
27
Next