ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.14940
  4. Cited By
Learning to Prompt for Open-Vocabulary Object Detection with
  Vision-Language Model

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Computer Vision and Pattern Recognition (CVPR), 2022
28 March 2022
Yu Du
Fangyun Wei
Zihe Zhang
Miaojing Shi
Yue Gao
Guoqi Li
    VPVLMVLM
ArXiv (abs)PDFHTMLGithub (181★)

Papers citing "Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model"

50 / 278 papers shown
Title
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V
Dingning Liu
Xiaomeng Dong
Renrui Zhang
Xu Luo
Shiyang Feng
Xiaoshui Huang
Yongshun Gong
Zhihui Wang
160
18
0
15 Dec 2023
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for
  Open-Vocabulary Object Detection
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object DetectionAAAI Conference on Artificial Intelligence (AAAI), 2023
Joonhyun Jeong
Geondo Park
Jayeon Yoo
Hyungsik Jung
Heesu Kim
VLMObjD
334
16
0
12 Dec 2023
Domain Prompt Learning with Quaternion Networks
Domain Prompt Learning with Quaternion NetworksComputer Vision and Pattern Recognition (CVPR), 2023
Qinglong Cao
Zhengqin Xu
Yuntian Chen
Chao Ma
Xiaokang Yang
VLM
228
20
0
12 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token PredictionComputer Vision and Pattern Recognition (CVPR), 2023
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
442
12
0
04 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLMVLM
256
18
0
03 Dec 2023
Language-conditioned Detection Transformer
Language-conditioned Detection TransformerComputer Vision and Pattern Recognition (CVPR), 2023
Jang Hyun Cho
Philipp Krahenbuhl
VLMObjD
175
4
0
29 Nov 2023
The devil is in the fine-grained details: Evaluating open-vocabulary
  object detectors for fine-grained understanding
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understandingComputer Vision and Pattern Recognition (CVPR), 2023
Lorenzo Bianchi
F. Carrara
Nicola Messina
Claudio Gennaro
Fabrizio Falchi
ObjD
309
24
0
29 Nov 2023
Hardware Resilience Properties of Text-Guided Image Classifiers
Hardware Resilience Properties of Text-Guided Image ClassifiersNeural Information Processing Systems (NeurIPS), 2023
Syed Talal Wasim
Kabila Haile Soboka
Abdulrahman Mahmoud
Salman Khan
David Brooks
Gu-Yeon Wei
VLM
185
1
0
23 Nov 2023
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated
  Student-Teacher Learning
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li
Weiwei Guo
Xue Yang
Ning Liao
Dunyun He
Jiaqi Zhou
Wenxian Yu
ObjDVLM
176
20
0
20 Nov 2023
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph
  Generation via Visual-Concept Alignment and Retention
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen
Jinlin Wu
Zhen Lei
Zhaoxiang Zhang
Changwen Chen
269
28
0
18 Nov 2023
TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity
  Recognition
TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity Recognition
Yunjiao Zhou
Jianfei Yang
Han Zou
Lihua Xie
VLM
170
27
0
14 Nov 2023
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
Meta-Adapter: An Online Few-shot Learner for Vision-Language ModelNeural Information Processing Systems (NeurIPS), 2023
Cheng Cheng
Lin Song
Ruoyi Xue
Hang Wang
Hongbin Sun
Yixiao Ge
Ying Shan
VLMObjD
383
45
0
07 Nov 2023
Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion
Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion
Hao Zhou
Tiancheng Shen
Xu Yang
Hai Huang
Xiangtai Li
Lu Qi
Ming-Hsuan Yang
204
13
0
06 Nov 2023
Recognize Any Regions
Recognize Any RegionsNeural Information Processing Systems (NeurIPS), 2023
Haosen Yang
Chuofan Ma
Bin Wen
Yi Jiang
Zehuan Yuan
Xiatian Zhu
ObjDVLM
319
3
0
02 Nov 2023
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation
Text Augmented Spatial-aware Zero-shot Referring Image SegmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuchen Suo
Linchao Zhu
Yi Yang
244
18
0
27 Oct 2023
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
LP-OVOD: Open-Vocabulary Object Detection by Linear ProbingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Chau Pham
Truong Vu
Khoi Duc Minh Nguyen
ObjD
274
27
0
26 Oct 2023
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary
  Object Detection
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object DetectionNeural Information Processing Systems (NeurIPS), 2023
Chuofan Ma
Yi Jiang
Xin Wen
Zehuan Yuan
Xiaojuan Qi
ObjDVLM
222
66
0
25 Oct 2023
On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection
On the Powerfulness of Textual Outlier Exposure for Visual OoD DetectionNeural Information Processing Systems (NeurIPS), 2023
Sangha Park
J. Mok
Dahuin Jung
Saehyung Lee
Sung-Hoon Yoon
241
13
0
25 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjDVLM
236
14
0
22 Oct 2023
Interactive Navigation in Environments with Traversable Obstacles Using
  Large Language and Vision-Language Models
Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language ModelsIEEE International Conference on Robotics and Automation (ICRA), 2023
Zhen Zhang
Anran Lin
Chun Wai Wong
Xiangyu Chu
Qi Dou
K. W. S. Au
LM&Ro
270
14
0
13 Oct 2023
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for
  Open-vocabulary 3D Object Detection
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionNeural Information Processing Systems (NeurIPS), 2023
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPCObjD
231
51
0
04 Oct 2023
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense
  Prediction
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionInternational Conference on Learning Representations (ICLR), 2023
Size Wu
Wenwei Zhang
Lumin Xu
Sheng Jin
Xiangtai Li
Wentao Liu
Chen Change Loy
CLIPVLM
213
102
0
02 Oct 2023
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object
  Detection
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Shilin Xu
Xiangtai Li
Size Wu
Wenwei Zhang
Yunhai Tong
Chen Change Loy
ObjDVLM
165
0
0
02 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Region-centric Image-Language Pretraining for Open-Vocabulary DetectionEuropean Conference on Computer Vision (ECCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
206
6
0
29 Sep 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary
  Instance Segmentation
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance SegmentationInternational Journal of Computer Vision (IJCV), 2023
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffMVLM
280
45
0
22 Sep 2023
Unsupervised Open-Vocabulary Object Localization in Videos
Unsupervised Open-Vocabulary Object Localization in VideosIEEE International Conference on Computer Vision (ICCV), 2023
Ke Fan
Zechen Bai
Tianjun Xiao
Dominik Zietlow
Max Horn
...
Bernt Schiele
Thomas Brox
Zheng Zhang
Yanwei Fu
Tong He
244
12
0
18 Sep 2023
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Efficient Emotional Adaptation for Audio-Driven Talking-Head GenerationIEEE International Conference on Computer Vision (ICCV), 2023
Yuan Gan
Zongxin Yang
Xihang Yue
Lingyun Sun
Yezhou Yang
211
91
0
10 Sep 2023
Distribution-Aware Prompt Tuning for Vision-Language Models
Distribution-Aware Prompt Tuning for Vision-Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Eulrang Cho
Jooyeon Kim
Hyunwoo J. Kim
VPVLMVLM
150
44
0
06 Sep 2023
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language
  Reasoning
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language ReasoningBritish Machine Vision Conference (BMVC), 2023
Yi Zhang
Ce Zhang
Zihan Liao
Yushun Tang
Zhihai He
BDLVLM
259
11
0
03 Sep 2023
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
EdaDet: Open-Vocabulary Object Detection Using Early Dense AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Cheng Shi
Sibei Yang
VLMObjD
201
57
0
03 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision TransformerIEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
294
37
0
02 Sep 2023
What Makes Good Open-Vocabulary Detector: A Disassembling Perspective
What Makes Good Open-Vocabulary Detector: A Disassembling Perspective
Jincheng Li
Chunyu Xie
Xiaoyu Wu
Bin Wang
Dawei Leng
VLMObjD
212
6
0
01 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object DetectionIEEE Transactions on Image Processing (IEEE TIP), 2023
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
173
9
0
30 Aug 2023
Unsupervised Prototype Adapter for Vision-Language Models
Unsupervised Prototype Adapter for Vision-Language ModelsChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Yi Zhang
Ce Zhang
Xue-mei Hu
Z. He
VLM
224
8
0
22 Aug 2023
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating
  the Generalizability of Video Question Answering Models
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Dohwan Ko
Ji Soo Lee
M. Choi
Jaewon Chu
Jihwan Park
Hyunwoo J. Kim
151
6
0
18 Aug 2023
Taming Self-Training for Open-Vocabulary Object Detection
Taming Self-Training for Open-Vocabulary Object DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Shiyu Zhao
S. Schulter
Long Zhao
Zhixing Zhang
Vijay Kumar B.G
Yumin Suh
Manmohan Chandraker
Dimitris N. Metaxas
VLMObjD
318
21
0
11 Aug 2023
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
  Vision-Language Models
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Kecheng Zheng
Wei Wu
Ruili Feng
Kai Zhu
Jiawei Liu
Deli Zhao
Zhengjun Zha
Wei Chen
Yujun Shen
VLM
186
12
0
27 Jul 2023
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?IEEE International Conference on Computer Vision (ICCV), 2023
Cheng-En Wu
Yu Tian
Haichao Yu
Heng Wang
Pedro Morgado
Yu Hen Hu
Linjie Yang
NoLaVPVLMVLM
123
25
0
22 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and FutureIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjDVLM
483
64
0
18 Jul 2023
Unified Open-Vocabulary Dense Visual Prediction
Unified Open-Vocabulary Dense Visual PredictionIEEE transactions on multimedia (IEEE TMM), 2023
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjDVLM
175
53
0
17 Jul 2023
Open Scene Understanding: Grounded Situation Recognition Meets Segment
  Anything for Helping People with Visual Impairments
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
R. Liu
Kailai Li
Kunyu Peng
Junwei Zheng
Ke Cao
Yufan Chen
Kailun Yang
Rainer Stiefelhagen
130
22
0
15 Jul 2023
Open-Vocabulary Object Detection via Scene Graph Discovery
Open-Vocabulary Object Detection via Scene Graph DiscoveryACM Multimedia (ACM MM), 2023
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
255
16
0
07 Jul 2023
Prompting classes: Exploring the Power of Prompt Class Learning in
  Weakly Supervised Semantic Segmentation
Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Balamurali Murugesan
Rukhshanda Hussain
Rajarshi Bhattacharya
Ismail Ben Ayed
Jose Dolz
VLMVPVLM
506
5
0
30 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjDVLM
390
213
0
28 Jun 2023
Text Promptable Surgical Instrument Segmentation with Vision-Language
  Models
Text Promptable Surgical Instrument Segmentation with Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Zijian Zhou
Oluwatosin O. Alabi
Meng Wei
Tom Vercauteren
Miaojing Shi
MedIm
205
35
0
15 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziqiao Ma
Jiayi Pan
J. Chai
ObjDVLM
183
12
0
14 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
230
9
0
14 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Extending CLIP's Image-Text Alignment to Referring Image SegmentationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
247
20
0
14 Jun 2023
Learning Domain-Aware Detection Head with Prompt Tuning
Learning Domain-Aware Detection Head with Prompt TuningNeural Information Processing Systems (NeurIPS), 2023
Haochen Li
Rui Zhang
Hantao Yao
Xinkai Song
Yifan Hao
Yongwei Zhao
Ling Li
Yunji Chen
VLM
267
24
0
09 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
  Vision-Language Tasks
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
200
4
0
07 Jun 2023
Previous
123456
Next