v1v2v3 (latest)

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

International Conference on Learning Representations (ICLR), 2021

28 April 2021

ArXiv (abs)PDF HTML Github (5247★)

Papers citing "Open-vocabulary Object Detection via Vision and Language Knowledge Distillation"

50 / 745 papers shown

BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing

...

30 Mar 2026

SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection

243

04 Dec 2025

FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

195

04 Dec 2025

VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models

Silin Cheng

Kai Han

MLLM VPVLM VLM

338

27 Nov 2025

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

298

26 Nov 2025

ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis

199

25 Nov 2025

From Reviewers' Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs

161

23 Nov 2025

VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection

311

22 Nov 2025

State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection

Jiaying Zhou

Qingchao Chen

150

22 Nov 2025

Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning

197

22 Nov 2025

MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization

Zhenying Fang

Richang Hong

204

17 Nov 2025

Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation

174

08 Nov 2025

Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection

458

07 Nov 2025

In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy

Shreyan Ganguly

Angona Biswas

Jaydeep Rade

Md Hasibul Hasan Hasib

Nabila Masud

...

Ushashi Bhattacharjee

273

04 Nov 2025

A Retrospect to Multi-prompt Learning across Vision and LanguageIEEE International Conference on Computer Vision (ICCV), 2023

454

31 Oct 2025

Test-Time Adaptive Object Detection with Foundation Model

399

29 Oct 2025

ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models

Pranav Saxena

Jimmy Chiun

VLM

136

24 Oct 2025

A Training-Free Framework for Open-Vocabulary Image Segmentation and Recognition with EfficientNet and CLIP

Ying Dai

Wei Yu Chen

ObjD VLM

323

22 Oct 2025

Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents

165

21 Oct 2025

On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration

528

20 Oct 2025

Towards 3D Objectness Learning in an Open World

207

20 Oct 2025

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

209

18 Oct 2025

TeamFormer: Shallow Parallel Transformers with Progressive Approximation

Wei Wang

Xiao-Yong Wei

Qing Li

135

17 Oct 2025

CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection

433

16 Oct 2025

Cluster-Aware Prompt Ensemble Learning for Few-Shot Vision-Language Model AdaptationPattern Recognition (Pattern Recogn.), 2025

233

10 Oct 2025

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding

284

10 Oct 2025

Vision Language Models: A Survey of 26K Papers

Fengming Lin

3DV VLM

164

10 Oct 2025

FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation

186

09 Oct 2025

A Multimodal Depth-Aware Method For Embodied Reference Understanding

375

09 Oct 2025

MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

154

06 Oct 2025

Cross-View Open-Vocabulary Object Detection in Aerial Imagery

249

04 Oct 2025

Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models

334

03 Oct 2025

VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors

160

01 Oct 2025

Adaptive Event Stream Slicing for Open-Vocabulary Event-Based Object Detection via Vision-Language Knowledge Distillation

165

01 Oct 2025

Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection

181

29 Sep 2025

FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology

157

29 Sep 2025

Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives

148

28 Sep 2025

C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection

...

327

27 Sep 2025

LAGEA: Language Guided Embodied Agents for Robotic Manipulation

Abdul Monaf Chowdhury

Akm Moshiur Rahman Mazumder

Rabeya Akter

S. Arib

LM&Ro

179

27 Sep 2025

Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning

228

27 Sep 2025

Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding

...

135

26 Sep 2025

Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning

...

163

24 Sep 2025

Knowledge Transfer from Interaction Learning

164

23 Sep 2025

COLA: Context-aware Language-driven Test-time AdaptationIEEE Transactions on Image Processing (IEEE TIP), 2025

308

22 Sep 2025

MVP: Motion Vector Propagation for Zero-Shot Video Object Detection

154

22 Sep 2025

Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation

318

18 Sep 2025

MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment

387

17 Sep 2025

When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection

Rabin Dulal

Lihong Zheng

M. A. Kabir

139

08 Sep 2025

Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding

194

08 Sep 2025

AttriPrompt: Dynamic Prompt Composition Learning for CLIP

184

07 Sep 2025