Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,336 papers shown
Title
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
96
5
0
05 Dec 2024
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
Ruili Feng
Han Zhang
Zhantao Yang
Jie Xiao
Zhilei Shu
Zhiheng Liu
Andy Zheng
Yukun Huang
Yu Liu
H. Zhang
VGen
87
9
0
04 Dec 2024
Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything
Y. Lee
S. K. Panda
Wei Wang
M. Jawed
62
0
0
04 Dec 2024
BIMCaP: BIM-based AI-supported LiDAR-Camera Pose Refinement
Miguel A. Vega-Torres
Anna Ribic
Borja García de Soto
A. Borrmann
3DPC
71
0
0
04 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
74
1
0
04 Dec 2024
Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation
Xuanlin Li
Tong Zhao
Xinghao Zhu
Jiuguang Wang
Tao Pang
Kuan Fang
82
4
0
03 Dec 2024
3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting
Ziyang Yan
Lei Li
Yihua Shao
Siyu Chen
Wuzong Kai
Jenq-Neng Hwang
Hao Zhao
Fabio Remondino
3DGS
77
3
0
02 Dec 2024
CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models
Zhixiang Guo
Siyuan Liang
Aishan Liu
Dacheng Tao
AAML
71
1
0
02 Dec 2024
PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control
Ruichen Wang
Junliang Zhang
Qingsong Xie
Chen Chen
H. Lu
DiffM
90
1
0
02 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
94
0
0
02 Dec 2024
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen
Zhuheng Song
Xiaoke Jiang
Yaoqing Hu
Junzhi Yu
Lei Zhang
3DH
HAI
78
0
0
02 Dec 2024
Sketch-Guided Motion Diffusion for Stylized Cinemagraph Synthesis
H. Jin
Hengyuan Chang
Xiaoxuan Xie
Zhengyang Wang
Xusheng Du
Shaojun Hu
H. Xie
DiffM
VGen
71
0
0
01 Dec 2024
GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision
Z. Li
Wenwei Han
Yujun Cai
Hao Jiang
Baolong Bi
Shuqin Gao
Honglong Zhao
Zhaoqi Wang
3DGS
67
1
0
30 Nov 2024
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
80
2
0
29 Nov 2024
Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection
Tsun-hin Cheung
Ka-Chun Fung
Songjiang Lai
Kwan-Ho Lin
Vincent To-Yee NG
K. Lam
72
0
0
28 Nov 2024
MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models
Angus Fung
A. H. Tan
Haitong Wang
B. Benhabib
G. Nejat
LM&Ro
121
1
0
27 Nov 2024
Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Tianyi Wei
Dongdong Chen
Yifan Zhou
Xingang Pan
EGVM
88
2
0
27 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
94
1
0
27 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
109
6
0
27 Nov 2024
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
Hui-Yue Yang
Hui Chen
Ao Wang
Kai Chen
Zijia Lin
Yongliang Tang
Pengcheng Gao
Yuming Quan
J. Han
Guiguang Ding
VLM
78
2
0
26 Nov 2024
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
Peng Cui
Guande He
Dan Zhang
Zhijie Deng
Yinpeng Dong
Jun Zhu
82
0
0
26 Nov 2024
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia
Jishuo Li
Zhiwei Lin
Xinhao Wang
Y. Wang
Ming-Hsuan Yang
VLM
69
2
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
106
2
0
26 Nov 2024
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Raktim Gautam Goswami
P. Krishnamurthy
Yann LeCun
Farshad Khorrami
92
1
0
26 Nov 2024
Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
Zaira Manigrasso
Matteo Dunnhofer
Antonino Furnari
Moritz Nottebaum
Antonio Finocchiaro
Davide Marana
G. Farinella
C. Micheloni
70
1
0
25 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
71
3
0
25 Nov 2024
Leverage Task Context for Object Affordance Ranking
Haojie Huang
Hongchen Luo
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
71
0
0
25 Nov 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Hua Zhang
Xiaochun Cao
FAtt
82
4
0
25 Nov 2024
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
87
4
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
106
1
0
25 Nov 2024
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
Guanyu Zhou
Wenxuan Liu
Wenxin Huang
Xuemei Jia
X. Zhong
Chia-Wen Lin
CML
76
0
0
24 Nov 2024
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
H. Li
VLM
67
3
0
24 Nov 2024
AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks
Y. Li
Fan Ma
Yi Yang
DiffM
144
2
0
24 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
H. Zhang
Yueting Zhuang
DiffM
100
15
0
24 Nov 2024
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Rui Huang
Henry Zheng
Yan Wang
Zhuofan Xia
Marco Pavone
Gao Huang
3DPC
VLM
83
1
0
23 Nov 2024
Fine-Grained Open-Vocabulary Object Recognition via User-Guided Segmentation
Jinwoo Ahn
Hyeokjoon Kwon
Hwiyeon Yoo
ObjD
VLM
77
0
0
23 Nov 2024
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin
Jooyoung Choi
Heeseung Kim
Sungroh Yoon
DiffM
87
8
0
23 Nov 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
S. Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
171
2
0
22 Nov 2024
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
Jiahao Hu
Tianxiong Zhong
Xuebo Wang
Boyuan Jiang
Xingye Tian
Fei Yang
Pengfei Wan
Di Zhang
VGen
72
2
0
22 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
116
1
0
22 Nov 2024
GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter
Aniruddha Bala
Rohan Jaiswal
Loay Rashid
Siddharth Roheda
72
0
0
21 Nov 2024
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation
Christoph Reinders
Radu Berdan
Beril Besbinar
Junji Otsuka
Daisuke Iso
78
2
0
20 Nov 2024
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao
Bin Zhu
Jingjing Chen
Chong-Wah Ngo
Yu-Gang Jiang
VLM
OffRL
69
0
0
19 Nov 2024
Text-guided Zero-Shot Object Localization
Jingjing Wang
Xinglin Piao
Zongzhi Gao
Bo Li
Yong Zhang
Baocai Yin
74
0
0
18 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Wentao Bao
K. Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
42
2
0
17 Nov 2024
DGS-SLAM: Gaussian Splatting SLAM in Dynamic Environment
Mangyu Kong
Jaewon Lee
Seongwon Lee
Euntai Kim
3DGS
26
1
0
16 Nov 2024
Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
Huancheng Chen
Jingtao Li
Weiming Zhuang
H. Vikalo
Lingjuan Lyu
DiffM
36
0
0
15 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Saeed Mian
Mohit Bansal
Chen Chen
LRM
56
1
0
15 Nov 2024
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting
Yian Wang
Xiaowen Qiu
Jiageng Liu
Zhehuan Chen
Jiting Cai
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian
Chuang Gan
VGen
AI4CE
46
6
0
14 Nov 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Wei Wang
Z. Li
Qi Xu
Linfeng Li
Yiqing Cai
Botian Jiang
Hang Song
Xingcan Hu
Pengyu Wang
Li Xiao
29
1
0
14 Nov 2024
Previous
1
2
3
...
7
8
9
...
25
26
27
Next