Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,335 papers shown
Title
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
Wenke Xia
Dong Wang
Xincheng Pang
Zhigang Wang
Bin Zhao
Di Hu
Xuelong Li
LM&Ro
27
19
0
06 Nov 2023
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop
Jiaxin Shen
Yanyao Liu
Ziming Wang
Ziyuan Jiao
Yufeng Chen
Wenjuan Han
12
0
0
05 Nov 2023
Precise Robotic Needle-Threading with Tactile Perception and Reinforcement Learning
Zhenjun Yu
Wenqiang Xu
Siqiong Yao
Jieji Ren
Tutian Tang
Yutong Li
Guoying Gu
Cewu Lu
26
8
0
04 Nov 2023
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation
Shichao Dong
Fayao Liu
Guosheng Lin
VLM
16
3
0
03 Nov 2023
UniFolding: Towards Sample-efficient, Scalable, and Generalizable Robotic Garment Folding
Han Xue
Yutong Li
Wenqiang Xu
Huanyu Li
Dongzhe Zheng
Cewu Lu
26
14
0
02 Nov 2023
Vision-Language Interpreter for Robot Task Planning
Keisuke Shirai
C. C. Beltran-Hernandez
Masashi Hamaya
Atsushi Hashimoto
Shohei Tanaka
Kento Kawaharazuka
Kazutoshi Tanaka
Yoshitaka Ushiku
Shinsuke Mori
LM&Ro
13
26
0
02 Nov 2023
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Thinh Phan
Khoa T. Vo
Duy Le
Gianfranco Doretto
Don Adjeroh
Ngan Le
VLM
21
9
0
01 Nov 2023
Learning to Follow Object-Centric Image Editing Instructions Faithfully
Tuhin Chakrabarty
Kanishk Singh
Arkadiy Saakyan
Smaranda Muresan
DiffM
17
6
0
29 Oct 2023
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
26
2
0
28 Oct 2023
Fine-Tuning Language Models Using Formal Methods Feedback
Yunhao Yang
N. Bhatt
Tyler Ingebrand
William Ward
Steven Carr
Zhangyang Wang
Ufuk Topcu
19
8
0
27 Oct 2023
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
Tsun-Hsuan Wang
Alaa Maalouf
Wei Xiao
Yutong Ban
Alexander Amini
Guy Rosman
S. Karaman
Daniela Rus
19
41
0
26 Oct 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
19
15
0
26 Oct 2023
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network
Yiming Lin
Xiao-Bo Jin
Qiufeng Wang
Kaizhu Huang
24
3
0
25 Oct 2023
Open-NeRF: Towards Open Vocabulary NeRF Decomposition
Hao Zhang
Fang Li
Narendra Ahuja
19
11
0
25 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Bill Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
30
113
0
24 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
18
21
0
24 Oct 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
11
66
0
23 Oct 2023
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge
Te-Lin Wu
Yu Zhou
Nanyun Peng
24
8
0
23 Oct 2023
Open-Set Image Tagging with Multi-Grained Text Supervision
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
VLM
25
28
0
23 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
26
9
0
22 Oct 2023
Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation
Francisco Eiras
Kemal Oksuz
Adel Bibi
Philip H. S. Torr
P. Dokania
25
1
0
20 Oct 2023
OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data
Yijie Zhou
Likun Cai
Xianhui Cheng
Zhongxue Gan
Xiangyang Xue
Wenchao Ding
3DV
VLM
11
13
0
20 Oct 2023
Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models
Zhaozheng Chen
Qianru Sun
VLM
27
7
0
19 Oct 2023
Object-aware Inversion and Reassembly for Image Editing
Zhen Yang
Dinggang Gui
Wen Wang
Hao Chen
Bohan Zhuang
Chunhua Shen
DiffM
22
14
0
18 Oct 2023
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Xinhua Cheng
Tianyu Yang
Jianan Wang
Yu Li
Lei Zhang
Jian Zhang
Li-ming Yuan
DiffM
23
43
0
18 Oct 2023
Language Models as Zero-Shot Trajectory Generators
Teyun Kwon
Norman Di Palo
Edward Johns
LM&Ro
25
45
0
17 Oct 2023
VcT: Visual change Transformer for Remote Sensing Image Change Detection
Bo Jiang
Zitian Wang
Xixi Wang
Ziyan Zhang
Lan Chen
Xiao Wang
Bin Luo
ViT
19
38
0
17 Oct 2023
Interactive Task Planning with Language Models
Boyi Li
Philipp Wu
Pieter Abbeel
Jitendra Malik
LM&Ro
34
33
0
16 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models
Zhen Zhang
Anran Lin
Chun Wai Wong
X. Chu
Qi Dou
K. W. S. Au
LM&Ro
22
3
0
13 Oct 2023
Learning to Act from Actionless Videos through Dense Correspondences
Po-Chen Ko
Jiayuan Mao
Yilun Du
Shao-Hua Sun
Josh Tenenbaum
22
74
0
12 Oct 2023
X-Pose: Detecting Any Keypoints
Jie-jin Yang
Ailing Zeng
Ruimao Zhang
Lei Zhang
31
6
0
12 Oct 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
27
3
0
12 Oct 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
15
7
0
12 Oct 2023
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing
Zijie Wu
Chaohui Yu
Zhen Zhu
Fan Wang
Xiang Bai
17
12
0
12 Oct 2023
Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation
Yinpei Dai
Run Peng
Sikai Li
Joyce Chai
LM&Ro
32
24
0
12 Oct 2023
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Zeqiang Lai
Xizhou Zhu
Jifeng Dai
Yu Qiao
Wenhai Wang
MLLM
DiffM
40
22
0
11 Oct 2023
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Wen-Hsuan Chu
Adam W. Harley
P. Tokmakov
Achal Dave
Leonidas J. Guibas
Katerina Fragkiadaki
VLM
18
7
0
10 Oct 2023
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
KAI-QING Zhou
Kwonjoon Lee
Teruhisa Misu
Xin Eric Wang
LRM
19
3
0
09 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
21
11
0
08 Oct 2023
OV-PARTS: Towards Open-Vocabulary Part Segmentation
Meng Wei
Xiaoyu Yue
Wenwei Zhang
Shu Kong
Xihui Liu
Jiangmiao Pang
VLM
18
24
0
08 Oct 2023
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Zhizheng Zhang
Wenxuan Xie
Xiaoyi Zhang
Yan Lu
21
10
0
07 Oct 2023
EasyPhoto: Your Smart AI Photo Generator
Ziheng Wu
Jiaqi Xu
Xinyi Zou
Kunzhe Huang
Xing Shi
Jun Huang
19
4
0
07 Oct 2023
CoralVOS: Dataset and Benchmark for Coral Video Segmentation
Ziqiang Zheng
Yaofeng Xie
Haixin Liang
Zhibin Yu
Sai-Kit Yeung
VOS
31
7
0
03 Oct 2023
MarineDet: Towards Open-Marine Object Detection
Haixin Liang
Ziqiang Zheng
Zeyu Ma
Sai-Kit Yeung
20
4
0
03 Oct 2023
Zero-Shot Refinement of Buildings' Segmentation Models using SAM
Ali Mayladan
Hasan Nasrallah
Hasan Moughnieh
Mustafa Shukor
A. Ghandour
22
4
0
03 Oct 2023
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation
Naitik Khandelwal
Xiao Liu
Mengmi Zhang
CLL
24
0
0
02 Oct 2023
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Size Wu
Wenwei Zhang
Lumin Xu
Sheng Jin
Xiangtai Li
Wentao Liu
Chen Change Loy
CLIP
VLM
24
68
0
02 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
12
4
0
02 Oct 2023
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
17
10
0
02 Oct 2023
Previous
1
2
3
...
23
24
25
26
27
Next