Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,336 papers shown
Title
Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
Jieren Deng
Haojian Zhang
Kun Ding
Jianhua Hu
Xingxuan Zhang
Yunkuan Wang
VLM
ObjD
72
4
0
04 Mar 2024
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen
Zhuokai Zhao
Hongyin Luo
Huaxiu Yao
Bo Li
Jiawei Zhou
MLLM
46
57
0
01 Mar 2024
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Yulong Liu
Yunlong Yuan
Chunwei Wang
Jianhua Han
Yongqiang Ma
Li Zhang
Nanning Zheng
Hang Xu
LLMAG
31
5
0
28 Feb 2024
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
Pau de Jorge
Riccardo Volpi
P. Dokania
Philip H. S. Torr
Grégory Rogez
DiffM
53
4
0
26 Feb 2024
StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention
SeungWon Seo
Suho Lee
Sangheum Hwang
30
0
0
25 Feb 2024
PhyPlan: Compositional and Adaptive Physical Task Reasoning with Physics-Informed Skill Networks for Robot Manipulators
Harshil Vagadia
Mudit Chopra
Abhinav Barnawal
Tamajit Banerjee
Shreshth Tuli
Souvik Chakraborty
Rohan Paul
PINN
LRM
24
2
0
24 Feb 2024
"It Is Hard to Remove from My Eye": Design Makeup Residue Visualization System for Chinese Traditional Opera (Xiqu) Performers
Zeyu Xiong
Shihan Fu
Yanying Zhu
Chenqing Zhu
Xiaojuan Ma
Mingming Fan
50
2
0
24 Feb 2024
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Hanxiao Jiang
Binghao Huang
Ruihai Wu
Zhuoran Li
Shubham Garg
H. Nayyeri
Shenlong Wang
Yunzhu Li
34
17
0
23 Feb 2024
OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding
Francis Engelmann
Ayca Takmaz
Jonas Schult
Elisabetta Fedele
Johanna Wald
...
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
36
3
0
23 Feb 2024
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
DeiSAM: Segment Anything with Deictic Prompting
Hikaru Shindo
Manuel Brack
Gopika Sudhakaran
D. Dhami
P. Schramowski
Kristian Kersting
VLM
29
2
0
21 Feb 2024
Aria Everyday Activities Dataset
Zhaoyang Lv
Nickolas Charron
Pierre Moulon
Alexander Gamino
Cheng Peng
...
Yuyang Zou
Richard A. Newcombe
Jakob Julian Engel
Xiaqing Pan
Carl Ren
29
10
0
20 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
51
41
0
19 Feb 2024
ISCUTE: Instance Segmentation of Cables Using Text Embedding
Shir Kozlovsky
O. Joglekar
Dotan Di Castro
32
2
0
19 Feb 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
60
47
0
18 Feb 2024
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions
Wenxuan Wang
Yisi Zhang
Xingjian He
Yichen Yan
Zijia Zhao
Xinlong Wang
Jing Liu
LM&Ro
25
4
0
17 Feb 2024
CoLLaVO: Crayon Large Language and Vision mOdel
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
VLM
MLLM
24
16
0
17 Feb 2024
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
Yuxuan Kuang
Hai Lin
Meng Jiang
LM&Ro
31
26
0
16 Feb 2024
PEGASUS: Personalized Generative 3D Avatars with Composable Attributes
Hyunsoo Cha
Byungjun Kim
Hanbyul Joo
13
4
0
16 Feb 2024
GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians
Haimin Luo
Ouyang Min
Zijun Zhao
Suyi Jiang
Longwen Zhang
Qixuan Zhang
Wei Yang
Lan Xu
Jingyi Yu
3DGS
30
26
0
16 Feb 2024
Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation
Junjie Shentu
Matthew Watson
Noura Al Moubayed
15
0
0
15 Feb 2024
Lester: rotoscope animation through video object segmentation and tracking
Ruben Tous
DiffM
VOS
31
0
0
15 Feb 2024
Magic-Me: Identity-Specific Video Customized Diffusion
Ze Ma
Daquan Zhou
Chun-Hsiao Yeh
Xue-She Wang
Xiuyu Li
Huanrui Yang
Zhen Dong
Kurt Keutzer
Jiashi Feng
VGen
DiffM
32
31
0
14 Feb 2024
Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance
Raza Imam
Muhammad Huzaifa
Nabil Mansour
Shaher Bano Mirza
Fouad Lamghari
20
0
0
10 Feb 2024
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan
Min Bai
Weifeng Chen
Xiong Zhou
Qixing Huang
Erran L. Li
VLM
23
18
0
09 Feb 2024
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng
Yujie Zhong
Zequn Jie
Weidi Xie
Lin Ma
ObjD
29
13
0
08 Feb 2024
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou
You Li
Fan Ma
Zongxin Yang
Yi Yang
DiffM
20
57
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
126
107
0
08 Feb 2024
λ
λ
λ
-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Maitreya Patel
Sangmin Jung
Chitta Baral
Yezhou Yang
VLM
31
28
0
07 Feb 2024
InCoRo: In-Context Learning for Robotics Control with Feedback Loops
Jiaqiang Ye Zhu
Carla Gomez Cano
David Vazquez Bermudez
Michal Drozdzal
LRM
30
7
0
07 Feb 2024
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
Zhuoyang Zhang
Han Cai
Song Han
VLM
27
3
0
07 Feb 2024
FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models
Chuhao Liu
Ke Wang
Jieqi Shi
Zhijian Qiao
Shaojie Shen
VLM
33
5
0
07 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
34
1
0
06 Feb 2024
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang
Runyu Ding
Ellis L Brown
Xiaojuan Qi
Saining Xie
LM&Ro
50
19
0
05 Feb 2024
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang
Trevor Darrell
Sai Saketh Rambhatla
Rohit Girdhar
Ishan Misra
VLM
DiffM
32
84
0
05 Feb 2024
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
28
38
0
05 Feb 2024
Applying Unsupervised Semantic Segmentation to High-Resolution UAV Imagery for Enhanced Road Scene Parsing
Zihan Ma
Yongshang Li
Ronggui Ma
Chen Liang
11
2
0
05 Feb 2024
A Survey on Robotics with Foundation Models: toward Embodied AI
Zhiyuan Xu
Kun Wu
Junjie Wen
Jinming Li
Ning Liu
Zhengping Che
Jian Tang
AI4CE
LRM
LM&Ro
23
24
0
04 Feb 2024
Region-Based Representations Revisited
Michal Shlapentokh-Rothman
Ansel Blume
Yao Xiao
Yuqun Wu
TV Sethuraman
Heyi Tao
Jae Yong Lee
Wilfredo Torres
Yu-xiong Wang
Derek Hoiem
32
5
0
04 Feb 2024
Language-guided Active Sensing of Confined, Cluttered Environments via Object Rearrangement Planning
Weihan Chen
Hanwen Ren
A. H. Qureshi
LM&Ro
10
1
0
04 Feb 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Hasan Hammoud
Hani Itani
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
CLIP
VLM
120
36
0
02 Feb 2024
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Jiawei Wang
Yuchen Zhang
Jiaxin Zou
Yan Zeng
Guoqiang Wei
Liping Yuan
Hang Li
DiffM
VGen
27
43
0
02 Feb 2024
Conditioning non-linear and infinite-dimensional diffusion processes
E. Baker
Gefan Yang
Michael L. Severinsen
C. Hipsley
Stefan Sommer
DiffM
36
6
0
02 Feb 2024
LINGO-Space: Language-Conditioned Incremental Grounding for Space
Dohyun Kim
Nayoung Oh
Deokmin Hwang
Daehyung Park
20
6
0
02 Feb 2024
A Survey for Foundation Models in Autonomous Driving
Haoxiang Gao
Yaqian Li
Kaiwen Long
Ming Yang
Yiqing Shen
VLM
LRM
53
23
0
02 Feb 2024
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
28
12
0
31 Jan 2024
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition
Xu Hu
Yuxi Wang
Lue Fan
Junsong Fan
Junran Peng
Zhen Lei
Qing Li
Zhaoxiang Zhang
Zhaoxiang Zhang
3DGS
42
8
0
31 Jan 2024
Rapid post-disaster infrastructure damage characterisation enabled by remote sensing and deep learning technologies -- a tiered approach
Nadiia Kopiika
A. Karavias
P. Krassakis
Zehao Ye
Jelena Ninić
N. Shakhovska
Nikolaos Koukouzas
S. Argyroudis
S. Mitoulis
14
8
0
31 Jan 2024
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
VLM
ObjD
16
246
0
30 Jan 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
24
88
0
30 Jan 2024
Previous
1
2
3
...
19
20
21
...
25
26
27
Next