Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,336 papers shown
Title
Empowering Large Language Models on Robotic Manipulation with Affordance Prompting
Guangran Cheng
Chuheng Zhang
Wenzhe Cai
Li Zhao
Changyin Sun
Jiang Bian
LM&Ro
LLMAG
187
9
0
17 Apr 2024
FoundationGrasp: Generalizable Task-Oriented Grasping with Foundation Models
Chao Tang
Dehao Huang
Wenlong Dong
Ruinian Xu
Hong Zhang
34
9
0
16 Apr 2024
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Hongxin Zhang
Zeyuan Wang
Qiushi Lyu
Zheyuan Zhang
Sunli Chen
Tianmin Shu
Yilun Du
Kwonjoon Lee
Yilun Du
Chuang Gan
48
12
0
16 Apr 2024
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Kim Hoang Tran
Phuc Vuong Do
Ngoc Quoc Ly
Ngan Le
36
4
0
15 Apr 2024
Zero-shot detection of buildings in mobile LiDAR using Language Vision Model
June Moh Goo
Zichao Zeng
Jan Boehm
43
2
0
15 Apr 2024
Zero-shot Building Age Classification from Facade Image Using GPT-4
Zichao Zeng
June Moh Goo
Xinglei Wang
Bin Chi
Meihui Wang
Jan Boehm
VLM
27
2
0
15 Apr 2024
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
Fangwei Zhong
Kui Wu
Hai Ci
Churan Wang
Hao Chen
OffRL
39
2
0
15 Apr 2024
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
Zhongrui Gui
Shuyang Sun
Runjia Li
Jianhao Yuan
Zhaochong An
Karsten Roth
Ameya Prabhu
Philip H. S. Torr
VLM
CLL
29
6
0
15 Apr 2024
VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection
Bonan Ding
Jin Xie
Jing Nie
Jiale Cao
24
2
0
15 Apr 2024
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao
Renjie Pi
Jianhua Han
Xiaodan Liang
Hang Xu
Wei Zhang
Zhenguo Li
Dan Xu
VLM
ObjD
53
20
0
14 Apr 2024
FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation
Riza Velioglu
Robin Chan
Barbara Hammer
24
0
0
12 Apr 2024
Visual Context-Aware Person Fall Detection
Aleksander Nagaj
Zenjie Li
Dimitris Papadopoulos
Kamal Nasrollahi
19
1
0
11 Apr 2024
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang
Haoxuan You
Philipp Dufter
Bowen Zhang
Chen Chen
...
Tsu-jui Fu
William Yang Wang
Shih-Fu Chang
Zhe Gan
Yinfei Yang
ObjD
MLLM
101
44
0
11 Apr 2024
Move Anything with Layered Scene Diffusion
Jiawei Ren
Mengmeng Xu
Jui-Chieh Wu
Ziwei Liu
Tao Xiang
Antoine Toisoul
29
9
0
10 Apr 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie
Julong Wei
Zhengjun Wang
Ke Wu
Shansuai Yuan
Kaizhao Zhang
Jie Jia
Jieru Zhao
Zhongxue Gan
Wenchao Ding
40
7
0
10 Apr 2024
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Sidra Aleem
Fangyijie Wang
Mayug Maniparambil
Eric Arazo
J. Dietlmeier
Guénolé Silvestre
Kathleen M. Curran
Noel E. O'Connor
Suzanne Little
VLM
MedIm
27
11
0
09 Apr 2024
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Jing Gu
Yilin Wang
Nanxuan Zhao
Wei Xiong
Qing Liu
Zhifei Zhang
He Zhang
Jianming Zhang
HyunJoon Jung
Xin Eric Wang
DiffM
32
8
0
08 Apr 2024
Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models
Yutao Ouyang
Jinhan Li
Yunfei Li
Zhongyu Li
Chao Yu
K. Sreenath
Yi Wu
49
15
0
08 Apr 2024
Hyperbolic Learning with Synthetic Captions for Open-World Detection
Fanjie Kong
Yanbei Chen
Jiarui Cai
Davide Modolo
VLM
ObjD
31
7
0
07 Apr 2024
DL-EWF: Deep Learning Empowering Women's Fashion with Grounded-Segment-Anything Segmentation for Body Shape Classification
Fatemeh Asghari
M. Soheili
Faeze Gholamrezaie
3DH
25
0
0
07 Apr 2024
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan
B. Vijaykumar
S. Schulter
Yun Fu
Manmohan Chandraker
LRM
ReLM
28
6
0
06 Apr 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
45
10
0
06 Apr 2024
DATENeRF: Depth-Aware Text-based Editing of NeRFs
Sara Rojas
Julien Philip
Kai Zhang
Sai Bi
Fujun Luan
Bernard Ghanem
Kalyan Sunkavalli
DiffM
27
3
0
06 Apr 2024
Mixed-Query Transformer: A Unified Image Segmentation Architecture
Pei Wang
Zhaowei Cai
Hao-Yu Yang
Ashwin Swaminathan
R. Manmatha
Stefano Soatto
75
2
0
06 Apr 2024
LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping
Kurran Singh
Tim Magoun
John J. Leonard
43
1
0
05 Apr 2024
Physical Property Understanding from Language-Embedded Feature Fields
Albert J. Zhai
Yuan Shen
Emily Y. Chen
Gloria X. Wang
Xinlei Wang
Sheng Wang
Kaiyu Guan
Shenlong Wang
33
13
0
05 Apr 2024
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Gihyun Kwon
Simon Jenni
Dingzeyu Li
Joon-Young Lee
Jong Chul Ye
Fabian Caba Heilbron
DiffM
45
13
0
05 Apr 2024
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu
Yongxin Yang
Shifeng Zhang
Fei Chen
Steven G. McDonagh
Gerasimos Lampouras
Ignacio Iacobacci
Sarah Parisot
37
10
0
03 Apr 2024
Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models
M. Pennisi
Giovanni Bellitto
S. Palazzo
Mubarak Shah
C. Spampinato
DiffM
21
0
0
03 Apr 2024
Text-driven Affordance Learning from Egocentric Vision
Tomoya Yoshida
Shuhei Kurita
Taichi Nishimura
Shinsuke Mori
37
5
0
03 Apr 2024
Red-Teaming Segment Anything Model
K. Jankowski
Bartlomiej Sobieski
Mateusz Kwiatkowski
J. Szulc
Michael F. Janik
Hubert Baniecki
P. Biecek
VLM
AAML
40
3
0
02 Apr 2024
Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions
Saptarshi Dasgupta
Akshat Gupta
Shreshth Tuli
Rohan Paul
27
2
0
02 Apr 2024
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
Vishnunandan L. N. Venkatesh
Byung-Cheol Min
LM&Ro
71
2
0
02 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
44
128
0
01 Apr 2024
Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts
Prakash Chandra Chhipa
Kanjar De
Meenakshi Subhash Chippa
Rajkumar Saini
Marcus Liwicki
ObjD
VLM
33
1
0
01 Apr 2024
DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
Sang-Kee Jo
Fei Pan
In-Jae Yu
Kyungsu Kim
30
2
0
30 Mar 2024
Efficient 3D Instance Mapping and Localization with Neural Fields
George Tang
Krishna Murthy Jatavallabhula
Antonio Torralba
ISeg
34
5
0
28 Mar 2024
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim M. Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
49
6
0
28 Mar 2024
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang
Yali Li
Taichi Liu
Hengshuang Zhao
Shengjin Wang
3DPC
ObjD
38
7
0
28 Mar 2024
Locate, Assign, Refine: Taming Customized Promptable Image Inpainting
Yulin Pan
Chaojie Mao
Zeyinzi Jiang
Zhen Han
Jingfeng Zhang
Xiangteng He
DiffM
44
2
0
28 Mar 2024
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Jiaxing Chen
Yuxuan Liu
Dehu Li
Xiang An
Weimo Deng
Ziyong Feng
Yongle Zhao
Yin Xie
LRM
46
14
0
28 Mar 2024
Annolid: Annotate, Segment, and Track Anything You Need
Chen Yang
Thomas A. Cleland
VOS
21
2
0
27 Mar 2024
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Qiao Gu
Zhaoyang Lv
Duncan Frost
Simon Green
Julian Straub
Chris Sweeney
3DGS
EgoV
26
21
0
26 Mar 2024
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen
Yipeng Qin
Lingjie Liu
Jiangbo Lu
Guanbin Li
35
11
0
26 Mar 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Hongsheng Li
VGen
LRM
MLLM
63
37
0
25 Mar 2024
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction
Xixuan Hao
Wei Chen
Yibo Yan
Siru Zhong
Kun Wang
Qingsong Wen
Yuxuan Liang
VLM
79
1
0
25 Mar 2024
Elysium: Exploring Object-level Perception in Videos via MLLM
Hang Wang
Yanjie Wang
Yongjie Ye
Yuxiang Nie
Can Huang
MLLM
42
19
0
25 Mar 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
34
0
0
25 Mar 2024
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim
Minyeong Kim
Junik Bae
Suhwan Choi
Sungkyung Kim
Buru Chang
VLM
24
3
0
24 Mar 2024
Segment Anything Model for Road Network Graph Extraction
Congrui Hetang
Haoru Xue
Cindy X. Le
Tianwei Yue
Wenping Wang
Yihui He
49
11
0
24 Mar 2024
Previous
1
2
3
...
17
18
19
...
25
26
27
Next