Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,336 papers shown
Title
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Jun Guo
Xiaojian Ma
Yue Fan
Huaping Liu
Qing Li
3DGS
36
26
0
22 Mar 2024
Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping
Wei Tang
Siang Chen
Pengwei Xie
Dingchang Hu
Wenming Yang
Guijin Wang
37
4
0
22 Mar 2024
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang
Feng Li
Zhaoyang Zeng
Tianhe Ren
Shilong Liu
Lei Zhang
VLM
27
37
0
21 Mar 2024
Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors
Nikolaos Tsagkas
Jack Rome
S. Ramamoorthy
Oisin Mac Aodha
Chris Xiaoxuan Lu
24
6
0
21 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
37
5
0
21 Mar 2024
Visual Imitation Learning of Task-Oriented Object Grasping and Rearrangement
Yichen Cai
Jianfeng Gao
Christoph Pohl
Tamim Asfour
32
4
0
20 Mar 2024
Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs
Yusuke Mikami
Andrew Melnik
Jun Miura
Ville Hautamaki
LM&Ro
LRM
58
4
0
20 Mar 2024
ReGround: Improving Textual and Spatial Grounding at No Cost
Yuseung Lee
Minhyuk Sung
DiffM
26
2
0
20 Mar 2024
Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing
Hangeol Chang
Jinho Chang
Jong Chul Ye
DiffM
37
3
0
20 Mar 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
38
9
0
20 Mar 2024
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li
Hao Zhang
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Lei Zhang
37
19
0
19 Mar 2024
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
36
3
0
19 Mar 2024
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
YiXuan Wu
Yizhou Wang
Shixiang Tang
Wenhao Wu
Tong He
Wanli Ouyang
Jian Wu
Philip H. S. Torr
ObjD
VLM
32
18
0
19 Mar 2024
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation
Hao Wang
Jiayou Qin
Ashish Bastola
Xiwen Chen
John Suchanek
Zihao Gong
Abolfazl Razi
35
15
0
19 Mar 2024
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi
Shihao Zhao
Xianbiao Qi
Jianan Wang
Yukai Shi
Qianyu Chen
Bin Liang
Kam-Fai Wong
Lei Zhang
DiffM
VGen
24
15
0
18 Mar 2024
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
Hyeonho Jeong
Jinho Chang
Geon Yeong Park
Jong Chul Ye
DiffM
VGen
27
13
0
18 Mar 2024
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu
Yunzhi Zhuge
Lu Zhang
Ping Hu
Dong Wang
Huchuan Lu
You He
VLM
KELM
CLL
OODD
108
68
0
18 Mar 2024
NetTrack: Tracking Highly Dynamic Objects with a Net
Guang-Zheng Zheng
Shijie Lin
Haobo Zuo
Changhong Fu
Jia Pan
33
10
0
17 Mar 2024
Towards Neuro-Symbolic Video Understanding
Minkyu Choi
Harsh Goel
Mohammad Omama
Yunhao Yang
Sahil Shah
Sandeep P. Chinchali
32
8
0
16 Mar 2024
Active Label Correction for Semantic Segmentation with Foundation Models
Hoyoung Kim
S. Hwang
Suha Kwak
Jungseul Ok
VLM
34
1
0
16 Mar 2024
GazeFusion: Saliency-Guided Image Generation
Yunxiang Zhang
Nan Wu
Connor Z. Lin
Gordon Wetzstein
Qi Sun
42
0
0
16 Mar 2024
Cannabis Seed Variant Detection using Faster R-CNN
Toqi Tahamid Sarker
Taminul Islam
Khaled R Ahmed
28
2
0
15 Mar 2024
Autonomous Monitoring of Pharmaceutical R&D Laboratories with 6 Axis Arm Equipped Quadruped Robot and Generative AI: A Preliminary Study
Shunichi Hato
Nozomi Ogawa
31
1
0
15 Mar 2024
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors
Atif Belal
Akhil Meethal
Francisco Perdigon Romero
M. Pedersoli
Eric Granger
35
1
0
14 Mar 2024
PosSAM: Panoptic Open-vocabulary Segment Anything
VS Vibashan
Shubhankar Borse
Hyojin Park
Debasmit Das
Vishal M. Patel
Munawar Hayat
Fatih Porikli
VLM
MLLM
38
6
0
14 Mar 2024
OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
Yinan Deng
Jiahui Wang
Jingyu Zhao
Xinyu Tian
Guangyan Chen
Yi Yang
Yufeng Yue
3DV
26
13
0
14 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
VLM
35
10
0
14 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
ObjD
36
12
0
14 Mar 2024
Annotation Free Semantic Segmentation with Vision Foundation Models
Soroush Seifi
Daniel Olmeda Reino
Fabien Despinoy
Rahaf Aljundi
VLM
29
1
0
14 Mar 2024
Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines
Liang Wu
X.-G. Ma
34
1
0
14 Mar 2024
CART: Caltech Aerial RGB-Thermal Dataset in the Wild
Connor T. Lee
Matthew O. Anderson
Nikhil Raganathan
Xingxing Zuo
Kevin Do
Georgia Gkioxari
Soon-Jo Chung
40
7
0
13 Mar 2024
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Jing Wu
Jiawang Bian
Xinghui Li
Guangrun Wang
Ian D Reid
Philip H. S. Torr
V. Prisacariu
3DGS
27
33
0
13 Mar 2024
Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models
Wensheng Liang
Ruiyan Zhuang
Xianwei Shi
Shuai Li
Zhicheng Wang
Xiaoguang Ma
CVBM
AI4CE
25
1
0
13 Mar 2024
Learning Generalizable Feature Fields for Mobile Manipulation
Ri-Zhao Qiu
Yafei Hu
Ge Yang
Yuchen Song
Yang Fu
...
Jiteng Mu
Ruihan Yang
Nikolay A. Atanasov
Sebastian Scherer
Xiaolong Wang
29
25
0
12 Mar 2024
TFCounter:Polishing Gems for Training-Free Object Counting
Pan Ting
Jianfeng Lin
Wenhao Yu
Wenlong Zhang
Xiaoying Chen
Jinlu Zhang
Binqiang Huang
35
0
0
12 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Yang Jiao
Shaoxiang Chen
Zequn Jie
Jing Chen
Lin Ma
Yueping Jiang
MLLM
37
18
0
12 Mar 2024
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Bingqian Lin
Yunshuang Nie
Ziming Wei
Jiaqi Chen
Shikui Ma
Jianhua Han
Hang Xu
Xiaojun Chang
Xiaodan Liang
LM&Ro
LRM
60
20
0
12 Mar 2024
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions
Lan Wang
Vishnu Naresh Boddeti
Sernam Lim
VGen
DiffM
37
0
0
11 Mar 2024
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
Tiancheng Zhao
Peng Liu
Xuan He
Lu Zhang
Kyusong Lee
ObjD
43
8
0
11 Mar 2024
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao
Lei Shang
Chang-Shu Liu
Baigui Sun
Xiang Ji
Jie Chen
CVBM
36
3
0
11 Mar 2024
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Deshun Yang
Luhui Hu
Yu Tian
Zihao Li
Chris Kelly
Bang Yang
Cindy Yang
Yuexian Zou
VGen
33
12
0
10 Mar 2024
AdaFold: Adapting Folding Trajectories of Cloths via Feedback-loop Manipulation
A. Longhini
Michael C. Welle
Zackory M. Erickson
Danica Kragic
AI4CE
43
6
0
10 Mar 2024
MACE: Mass Concept Erasure in Diffusion Models
Shilin Lu
Zilan Wang
Leyang Li
Yanzhu Liu
A. Kong
DiffM
39
76
0
10 Mar 2024
Grasping Trajectory Optimization with Point Clouds
Yu Xiang
Sai Haneesh Allu
Rohith Peddi
Tyler Summers
Vibhav Gogate
3DPC
19
2
0
08 Mar 2024
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
Anindya Mondal
Sauradip Nag
Xiatian Zhu
Anjan Dutta
36
3
0
08 Mar 2024
R
2
\text{R}^2
R
2
-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
Xiang Li
Kai Qiu
Jinglu Wang
Xiaohao Xu
Rita Singh
Kashu Yamazaki
Hao Chen
Xiaonan Huang
Bhiksha Raj
VOS
40
1
0
07 Mar 2024
Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
Wei Zhang
Miaoxin Cai
Tong Zhang
Guoqiang Lei
Zhuang Yin
Xuerui Mao
27
6
0
06 Mar 2024
FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion
Hao Wang
Sayed Pedram Haeri Boroujeni
Xiwen Chen
Ashish Bastola
Huayu Li
Abolfazl Razi
27
1
0
06 Mar 2024
Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks
Jianfeng Gao
Xiaoshu Jin
F. Krebs
Noémie Jaquier
Tamim Asfour
SSL
39
14
0
05 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan
Jaemin Cho
Elias Stengel-Eskin
Mohit Bansal
VLM
ObjD
51
29
0
04 Mar 2024
Previous
1
2
3
...
18
19
20
...
25
26
27
Next