ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.14940
  4. Cited By
Learning to Prompt for Open-Vocabulary Object Detection with
  Vision-Language Model

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Computer Vision and Pattern Recognition (CVPR), 2022
28 March 2022
Yu Du
Fangyun Wei
Zihe Zhang
Miaojing Shi
Yue Gao
Guoqi Li
    VPVLMVLM
ArXiv (abs)PDFHTMLGithub (181★)

Papers citing "Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model"

50 / 278 papers shown
Title
GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
Bin Wang
Ruotong Hu
Wenqian Wang
W. Li
Mingliang Gao
Runmin Cong
Wei Zhang
VLM
96
0
0
27 Nov 2025
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
Chujie Wang
Jianyu Lu
Zhiyuan Luo
Xi Chen
Chu He
LM&Ro
226
0
0
26 Nov 2025
ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis
ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis
Advik Sinha
Saurabh Atreya
Aashutosh A V
Sk Aziz Ali
Abhijit Das
CLIP
120
0
0
25 Nov 2025
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
Jiaying Zhou
Qingchao Chen
92
0
0
22 Nov 2025
Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection
Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection
Xian-Hong Huang
Hui-Kai Su
Chi-Chia Sun
Jun-Wei Hsieh
ObjD
360
0
0
07 Nov 2025
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
Haonan Yuan
Qingyun Sun
Junhua Shi
Xingcheng Fu
Bryan Hooi
Jianxin Li
Philip S. Yu
VLM
224
1
0
05 Nov 2025
A Retrospect to Multi-prompt Learning across Vision and Language
A Retrospect to Multi-prompt Learning across Vision and LanguageIEEE International Conference on Computer Vision (ICCV), 2023
Ziliang Chen
Xin Huang
Quanlong Guan
Liang Lin
Weiqi Luo
VPVLMVLM
365
10
0
31 Oct 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
Xiaoxing Hu
Kaicheng Yang
Ziyang Gong
Qi Ming
Zonghao Guo
Xiang An
Ziyong Feng
Junchi Yan
Xue Yang
CLIPVLM
195
0
0
21 Oct 2025
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjDLRMVLM
281
1
0
16 Oct 2025
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Jyoti Kini
Rohit Gupta
Mubarak Shah
ObjDVLM
173
0
0
04 Oct 2025
Toward a Holistic Approach to Continual Model Merging
Toward a Holistic Approach to Continual Model Merging
Hoang Phan
Sungmin Cha
Tung Lam Tran
Qi Lei
MoMeCLL
182
1
0
28 Sep 2025
VISION: Prompting Ocean Vertical Velocity Reconstruction from Incomplete Observations
VISION: Prompting Ocean Vertical Velocity Reconstruction from Incomplete Observations
Yuan Gao
Hao Wu
Qingsong Wen
Kun Wang
X. Wu
Xiaomeng Huang
VLM
180
0
0
25 Sep 2025
Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model
Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model
Phuoc-Nguyen Bui
Khanh-Binh Nguyen
Hyunseung Choo
VLM
280
0
0
04 Sep 2025
OVGrasp: Open-Vocabulary Grasping Assistance via Multimodal Intent Detection
OVGrasp: Open-Vocabulary Grasping Assistance via Multimodal Intent Detection
Chen Hu
Shan Luo
Letizia Gionfrida
76
0
0
04 Sep 2025
Object Detection with Multimodal Large Vision-Language Models: An In-depth Review
Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025
Ranjan Sapkota
Manoj Karkee
ObjDVLM
267
13
0
25 Aug 2025
Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models
Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models
Xiaojie Yin
Qilong Wang
Q. Hu
VLM
136
0
0
24 Aug 2025
Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes
Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes
Xinhao Xiang
Kuan-Chuan Peng
Suhas Lohit
Michael Jeffrey Jones
Jiawei Zhang
3DPC
126
1
0
22 Aug 2025
Incremental Object Detection with Prompt-based Methods
Incremental Object Detection with Prompt-based Methods
Matthias Neuwirth-Trapp
Maarten Bieshaar
Danda Pani Paudel
Luc Van Gool
CLLVLM
205
0
0
20 Aug 2025
DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition
DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition
Haijing Liu
Tao Pu
Hefeng Wu
Keze Wang
Guanbin Li
ObjDVLM
110
0
0
07 Aug 2025
Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications
Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications
Noreen Anwar
Guillaume-Alexandre Bilodeau
W. Bouachir
65
0
0
06 Aug 2025
ODOV: Towards Open-Domain Open-Vocabulary Object Detection
ODOV: Towards Open-Domain Open-Vocabulary Object Detection
Yupeng Zhang
Ruize Han
Fangnan Zhou
Song Wang
Wei Feng
Liang Wan
ObjDVLM
157
0
0
02 Aug 2025
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Xinyu Chen
Haotian Zhai
Can Zhang
Xiupeng Shi
Ruirui Li
VLM
205
0
0
02 Aug 2025
PEACE: Prompt Engineering Automation for CLIPSeg Enhancement for Safe-Landing Zone Segmentation
PEACE: Prompt Engineering Automation for CLIPSeg Enhancement for Safe-Landing Zone Segmentation
Haechan Mark Bong
Rongge Zhang
Ricardo de Azambuja
Giovanni Beltrame
349
2
0
01 Jul 2025
Open World Object Detection: A Survey
Open World Object Detection: A Survey
Yiming Li
Yi Wang
Wenqian Wang
Dan Lin
Bingbing Li
Kim-Hui Yap
ObjD
332
19
0
01 Jul 2025
SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
Yifan Yang
Zhen-ying Zhang
Rupak Vignesh Swaminathan
Jing Liu
Nathan Susanj
Zheng Zhang
VLM
174
1
0
26 Jun 2025
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
Xiaoya Lu
Zeren Chen
Xuhao Hu
Yijin Zhou
Weichen Zhang
Dongrui Liu
Lu Sheng
Jing Shao
312
6
0
19 Jun 2025
Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection
Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection
Shanmukha Vellamcheti
Sanjoy Kundu
Sathyanarayanan N. Aakur
211
0
0
06 Jun 2025
un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
un2^22CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
Yinqi Li
Jiahe Zhao
Hong Chang
Ruibing Hou
Shiguang Shan
Xilin Chen
CLIPVLM
289
1
0
30 May 2025
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Chenbin Pan
Wenbin He
Zhengzhong Tu
Liu Ren
LRMVLM
459
2
0
29 May 2025
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Peter Robicheaux
Matvei Popov
Anish Madan
Isaac Robinson
Joseph Nelson
Deva Ramanan
Neehar Peri
ObjDVLM
336
14
0
27 May 2025
Open-Det: An Efficient Learning Framework for Open-Ended Detection
Open-Det: An Efficient Learning Framework for Open-Ended Detection
Guiping Cao
Tao Wang
Wenjian Huang
X. Lan
Jianguo Zhang
Shihong Deng
ObjDVLM
177
1
0
27 May 2025
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
Zuyao Chen
Jinlin Wu
Zhen Lei
Chang Wen Chen
166
0
0
26 May 2025
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
Wei Jie Yeo
Rui Mao
Moloud Abdar
Erik Cambria
Frank Xing
264
3
0
23 May 2025
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Ta Duc Huy
Duy Anh Huynh
Yutong Xie
Yuankai Qi
Qi Chen
...
Anton van den Hengel
Zhibin Liao
Minh-Son To
Johan Verjans
Vu Minh Hieu Phan
364
2
0
21 May 2025
Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Lucas Choi
Ross Greer
VLM
363
1
0
14 May 2025
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Learning to Detect Multi-class Anomalies with Just One Normal Image PromptEuropean Conference on Computer Vision (ECCV), 2025
Bin-Bin Gao
241
13
0
14 May 2025
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Wenwen Qiang
Jianqi Zhang
Jingyao Wang
Changwen Zheng
VLM
333
0
0
10 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense PerceptionComputer Vision and Pattern Recognition (CVPR), 2025
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
277
5
0
07 May 2025
Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
Neng Dong
Shuanglin Yan
Liyan Zhang
Jinhui Tang
262
0
0
01 May 2025
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual Prompting
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual PromptingIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025
Wei Zhang
Miaoxin Cai
Yaqian Ning
Tianze Zhang
Yin Zhuang
He Chen
He Chen
Jun Li
Xuerui Mao
315
0
0
17 Apr 2025
Generalized Visual Relation Detection with Diffusion Models
Generalized Visual Relation Detection with Diffusion Models
Kaifeng Gao
Siqi Chen
Hanwang Zhang
Jun Xiao
Yueting Zhuang
Qianru Sun
265
0
0
16 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
Jing Zhang
...
Jiahui Lv
Ziqiang Liu
Tengyuan Shi
Qingjie Liu
Longji Xu
MLLMVLM
286
9
0
13 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
242
3
0
09 Apr 2025
Semantic-guided Representation Learning for Multi-Label Recognition
Semantic-guided Representation Learning for Multi-Label Recognition
Ruhui Zhang
Hezhe Qiao
Pengcheng Xu
Mingsheng Shang
Lin Chen
196
2
0
04 Apr 2025
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
Sanjoy Kundu
Shanmukha Vellamchetti
Sathyanarayanan N. Aakur
EgoV
344
2
0
04 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Refining CLIP's Spatial Awareness: A Visual-Centric PerspectiveInternational Conference on Learning Representations (ICLR), 2025
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
243
5
0
03 Apr 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security InspectionComputer Vision and Pattern Recognition (CVPR), 2025
Divya Velayudhan
A. Ahmed
Mohamad Alansari
Neha Gour
Abderaouf Behouch
...
Muzammal Naseer
Juergen Gall
Mohammed Bennamoun
Ernesto Damiani
Naoufel Werghi
281
2
0
03 Apr 2025
GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Xingyu Peng
Si Liu
Chen Gao
Yan Bai
Beipeng Mu
Xiaofei Wang
Huaxia Xia
297
2
0
26 Mar 2025
Anomize: Better Open Vocabulary Video Anomaly Detection
Anomize: Better Open Vocabulary Video Anomaly DetectionComputer Vision and Pattern Recognition (CVPR), 2025
Fei Li
Wenxuan Liu
Jintai Chen
Ruixu Zhang
Longji Xu
Zhuo Zhou
Zheng Wang
281
3
0
23 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object DetectionInternational Conference on Learning Representations (ICLR), 2025
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjDVLM
998
4
0
14 Mar 2025
123456
Next