Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.04514
Cited By
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
10 April 2023
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment"
50 / 63 papers shown
Title
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
43
0
0
06 May 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
J. Zhang
...
Jiahui Lv
Z. Liu
Tengyuan Shi
Qingjie Liu
Y. Wang
MLLM
VLM
55
1
0
13 Apr 2025
GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Xingyu Peng
Si Liu
Chen Gao
Yan Bai
Beipeng Mu
Xiaofei Wang
Huaxia Xia
62
0
0
26 Mar 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
H. Zhang
X. Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Y. Yang
Zhe Gan
CLIP
VLM
65
7
0
20 Feb 2025
YOLO-UniOW: Efficient Universal Open-World Object Detection
Lihao Liu
Juexiao Feng
Hui Chen
Ao Wang
Lin Song
J. Han
Guiguang Ding
ObjD
VLM
33
2
0
31 Dec 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
24
0
0
13 Nov 2024
ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images
Timing Yang
Yuanliang Ju
Li Yi
3DPC
29
3
0
31 Oct 2024
CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models
J. Gao
Chen Cai
Ruoyu Wang
Wenyang Liu
Kim-Hui Yap
Kratika Garg
Boon-Siew Han
VLM
18
0
0
21 Oct 2024
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability
Yusuke Hosoya
Masanori Suganuma
Takayuki Okatani
ObjD
16
0
0
20 Oct 2024
Open World Object Detection: A Survey
Yiming Li
Yi Wang
Wenqian Wang
Dan Lin
Bingbing Li
Kim-Hui Yap
ObjD
30
0
0
15 Oct 2024
OVA-Det: Open Vocabulary Aerial Object Detection with Image-Text Collaboration
Guoting Wei
Xia Yuan
Yu Liu
Zhenhao Shang
Kelu Yao
Peng Wang
Qingsen Yan
Chunxia Zhao
Haokui Zhang
Rong Xiao
VLM
ObjD
39
1
0
22 Aug 2024
Dynamic Object Queries for Transformer-based Incremental Object Detection
Jichuan Zhang
Wei Li
Shuang Cheng
Yali Li
Shengjin Wang
23
0
0
31 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
31
1
0
23 Jul 2024
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du
Yu Wang
Yifan Sun
Luting Wang
Yue Liao
Gang Zhang
Errui Ding
Yan Wang
Jingdong Wang
Si Liu
VLM
ObjD
25
1
0
16 Jul 2024
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou
Zheng Zhu
Holger Caesar
Miaojing Shi
VLM
24
2
0
15 Jul 2024
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Yu Wang
Xiangbo Su
Qiang Chen
Xinyu Zhang
Teng Xi
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
ObjD
VLM
34
0
0
15 Jul 2024
Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
Xingyu Peng
Yan Bai
Chen Gao
Lirong Yang
Fei Xia
Beipeng Mu
Xiaofei Wang
Si Liu
ObjD
27
3
0
12 Jul 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
VLM
CLIP
32
13
0
11 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
28
0
0
07 Jun 2024
Radar Spectra-Language Model for Automotive Scene Parsing
Mariia Pushkareva
Yuri Feldman
Csaba Domokos
K. Rambach
Dotan Di Castro
44
1
0
04 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
F. Khan
VLM
ISeg
68
6
0
04 Jun 2024
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
28
6
0
02 Jun 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
...
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
ObjD
VLM
29
32
0
16 May 2024
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong
You-Chen Liu
Lai Xing Ng
Benoit R. Cottereau
Wei Tsang Ooi
VLM
29
12
0
08 May 2024
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao
Renjie Pi
Jianhua Han
Xiaodan Liang
Hang Xu
Wei Zhang
Zhenguo Li
Dan Xu
VLM
ObjD
34
19
0
14 Apr 2024
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu
Yongxin Yang
Shifeng Zhang
Fei Chen
Steven G. McDonagh
Gerasimos Lampouras
Ignacio Iacobacci
Sarah Parisot
37
10
0
03 Apr 2024
Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts
Prakash Chandra Chhipa
Kanjar De
Meenakshi Subhash Chippa
Rajkumar Saini
Marcus Liwicki
ObjD
VLM
26
1
0
01 Apr 2024
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang
Feng Li
Zhaoyang Zeng
Tianhe Ren
Shilong Liu
Lei Zhang
VLM
27
32
0
21 Mar 2024
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray
Zi Huang
Tatsuya Harada
Yadan Luo
21
9
0
20 Mar 2024
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin
Yi-Xin Jiang
Lizhen Qu
Zehuan Yuan
Jianfei Cai
ObjD
VLM
38
13
0
15 Mar 2024
Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection
Hao Li
Wei Wang
Cong Wang
Zhigang Luo
Xinwang Liu
KenLi Li
Xiaochun Cao
ObjD
13
0
0
02 Feb 2024
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
VLM
ObjD
16
242
0
30 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
51
16
0
16 Jan 2024
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
Rongqin Liang
Yuanman Li
Jiantao Zhou
Xia Li
18
6
0
07 Jan 2024
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu
Yuanfan Guo
Jianhua Han
Minzhe Niu
Yihan Zeng
Songcen Xu
Zeyi Huang
Zhao Zhong
Wei Zhang
Hang Xu
26
4
0
27 Dec 2023
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
P. Nguyen
T.D. Ngo
E. Kalogerakis
Chuang Gan
Anh Tran
Cuong Pham
Khoi Duc Minh Nguyen
ISeg
16
51
0
17 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi-Xin Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
25
38
0
14 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
87
68
0
05 Dec 2023
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
21
17
0
01 Dec 2023
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li
Weiwei Guo
Xue Yang
Ning Liao
Dunyun He
Jiaqi Zhou
Wenxian Yu
ObjD
VLM
15
7
0
20 Nov 2023
EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Bufang Yang
Lixing He
Neiwen Ling
Zhenyu Yan
Guoliang Xing
Xian Shuai
Xiaozhe Ren
Xin Jiang
43
20
0
18 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
19
1
0
05 Nov 2023
Recognize Any Regions
Haosen Yang
Chuofan Ma
Bin Wen
Yi-Xin Jiang
Zehuan Yuan
Xiatian Zhu
ObjD
VLM
38
3
0
02 Nov 2023
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma
Yi-Xin Jiang
Xin Wen
Zehuan Yuan
Xiaojuan Qi
ObjD
VLM
18
48
0
25 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
21
9
0
22 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
17
32
0
20 Oct 2023
A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design
Manna Dai
Yang Jiang
Fengxia Yang
Joyjit Chattoraj
Yingzhi Xia
Xinxing Xu
Weijiang Zhao
M. Dao
Yong Liu
GAN
21
1
0
18 Oct 2023
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
8
33
0
04 Oct 2023
MarineDet: Towards Open-Marine Object Detection
Haixin Liang
Ziqiang Zheng
Zeyu Ma
Sai-Kit Yeung
18
4
0
03 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
9
3
0
29 Sep 2023
1
2
Next