ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.13921
  4. Cited By
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
v1v2v3 (latest)

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

International Conference on Learning Representations (ICLR), 2021
28 April 2021
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
    VLMObjD
ArXiv (abs)PDFHTMLGithub (5247★)

Papers citing "Open-vocabulary Object Detection via Vision and Language Knowledge Distillation"

50 / 745 papers shown
BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing
BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing
Fangxun Liu
S M Rayeed
Samuel Stevens
Alyson East
Cheng Hsuan Chiang
...
Eric Sokol
Michael Belitz
Sydne Record
Charles V. Stewart
Wei-Lun Chao
92
3
0
30 Mar 2026
SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection
SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection
Qing Xu
Yanqian Wang
Xiangjian Hea
Yue Li
Yixuan Zhang
Rong Qu
Wenting Duan
Zhen Chen
MedIm
243
0
0
04 Dec 2025
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
Chengyang He
Ge Sun
Yue Bai
Junkai Lu
Jiadong Zhao
Guillaume Sartoretti
195
1
0
04 Dec 2025
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
Silin Cheng
Kai Han
MLLMVPVLMVLM
338
3
0
27 Nov 2025
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
Chujie Wang
Jianyu Lu
Zhiyuan Luo
Xi Chen
Chu He
LM&Ro
298
0
0
26 Nov 2025
ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis
ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis
Advik Sinha
Saurabh Atreya
Aashutosh A V
Sk Aziz Ali
Abhijit Das
CLIP
199
0
0
25 Nov 2025
From Reviewers' Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs
From Reviewers' Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs
Jiangrui Zheng
Yingming Zhou
Ali Abdullah Ahmad
Hanqing Yao
Xueqing Liu
161
0
0
23 Nov 2025
VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection
VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection
Jianhang Yao
Yongbin Zheng
Siqi Lu
Wanying Xu
Peng Sun
ObjDVLM
311
0
0
22 Nov 2025
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
Jiaying Zhou
Qingchao Chen
150
0
0
22 Nov 2025
Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning
Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning
Xiaohong Liu
Xiufeng Song
Huayu Zheng
Lei Bai
Xiaoming Liu
Guangtao Zhai
DiffM
197
0
0
22 Nov 2025
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
Zhenying Fang
Richang Hong
204
0
0
17 Nov 2025
Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation
Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation
Lin Li
Chuhan Zhang
Dong Zhang
Chong Sun
Chen Li
L. Chen
174
0
0
08 Nov 2025
Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection
Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection
Xian-Hong Huang
Hui-Kai Su
Chi-Chia Sun
Jun-Wei Hsieh
ObjD
458
0
0
07 Nov 2025
In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy
In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy
Shreyan Ganguly
Angona Biswas
Jaydeep Rade
Md Hasibul Hasan Hasib
Nabila Masud
...
Ushashi Bhattacharjee
Aditya Balu
A. Sarkar
A. Krishnamurthy
Soumik Sarkar
ObjDVLM
273
0
0
04 Nov 2025
A Retrospect to Multi-prompt Learning across Vision and Language
A Retrospect to Multi-prompt Learning across Vision and LanguageIEEE International Conference on Computer Vision (ICCV), 2023
Ziliang Chen
Xin Huang
Quanlong Guan
Liang Lin
Weiqi Luo
VPVLMVLM
454
12
0
31 Oct 2025
Test-Time Adaptive Object Detection with Foundation Model
Test-Time Adaptive Object Detection with Foundation Model
Yingjie Gao
Yanan Zhang
Zhi Cai
Di Huang
VLMTTA
399
2
0
29 Oct 2025
ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models
ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models
Pranav Saxena
Jimmy Chiun
VLM
136
1
0
24 Oct 2025
A Training-Free Framework for Open-Vocabulary Image Segmentation and Recognition with EfficientNet and CLIP
A Training-Free Framework for Open-Vocabulary Image Segmentation and Recognition with EfficientNet and CLIP
Ying Dai
Wei Yu Chen
ObjDVLM
323
0
0
22 Oct 2025
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents
Yiqi Lin
Alex Jinpeng Wang
Linjie Li
Zhengyuan Yang
Mike Zheng Shou
165
1
0
21 Oct 2025
On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration
On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration
Yehonathan Refael
Amit Aides
Aviad Barzilai
George Leifman
Genady Beryozkin
Vered Silverman
Bolous Jaber
Tomer Shekel
ObjD
528
0
0
20 Oct 2025
Towards 3D Objectness Learning in an Open World
Towards 3D Objectness Learning in an Open World
Taichi Liu
Zhenyu Wang
Ruofeng Liu
Guang Wang
Desheng Zhang
3DPCVLM
207
0
0
20 Oct 2025
Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions
Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions
Jihoon Kwon
Kyle Min
Jy-yong Sohn
CoGe
209
1
0
18 Oct 2025
TeamFormer: Shallow Parallel Transformers with Progressive Approximation
TeamFormer: Shallow Parallel Transformers with Progressive Approximation
Wei Wang
Xiao-Yong Wei
Qing Li
135
0
0
17 Oct 2025
CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection
CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjDLRM
433
1
0
16 Oct 2025
Cluster-Aware Prompt Ensemble Learning for Few-Shot Vision-Language Model Adaptation
Cluster-Aware Prompt Ensemble Learning for Few-Shot Vision-Language Model AdaptationPattern Recognition (Pattern Recogn.), 2025
Zhi Chen
Xin Yu
Xiaohui Tao
Yan Li
Zi Huang
VLM
233
11
0
10 Oct 2025
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Weikai Huang
Jieyu Zhang
Taoyang Jia
Chenhao Zheng
Ziqi Gao
J. S. Park
Winson Han
Ranjay Krishna
284
0
0
10 Oct 2025
Vision Language Models: A Survey of 26K Papers
Vision Language Models: A Survey of 26K Papers
Fengming Lin
3DVVLM
164
0
0
10 Oct 2025
FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation
FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation
Hongrui Wu
Zhicheng Gao
Jin Cao
Kelu Yao
Wen Shen
Zhihua Wei
VLM
186
0
0
09 Oct 2025
A Multimodal Depth-Aware Method For Embodied Reference Understanding
A Multimodal Depth-Aware Method For Embodied Reference Understanding
Fevziye Irem Eyiokur
Dogucan Yaman
H. K. Ekenel
Alexander Waibel
ObjD
375
0
0
09 Oct 2025
MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models
MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models
Soo Yong Kim
Suin Cho
Vincent-Daniel Yun
Gyeongyeon Hwang
LRM
154
0
0
06 Oct 2025
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Jyoti Kini
Rohit Gupta
Mubarak Shah
ObjDVLM
249
1
0
04 Oct 2025
Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models
Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models
Lihua Zhou
Mao Ye
Shuaifeng Li
Nianxin Li
Jinlin Wu
X. Zhu
Lei Deng
Hongbin Liu
Jiebo Luo
Zhen Lei
BDLVLMTTA
334
0
0
03 Oct 2025
VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors
VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors
Atif Belal
H. R. Medeiros
M. Pedersoli
Eric Granger
ObjDVLMTTA
160
0
0
01 Oct 2025
Adaptive Event Stream Slicing for Open-Vocabulary Event-Based Object Detection via Vision-Language Knowledge Distillation
Adaptive Event Stream Slicing for Open-Vocabulary Event-Based Object Detection via Vision-Language Knowledge Distillation
Jinchang Zhang
Zijun Li
Jiakai Lin
Guoyu Lu
ObjDVLM
165
4
0
01 Oct 2025
Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
Sojung An
Kwanyong Park
Yong Jae Lee
Donghyun Kim
181
0
0
29 Sep 2025
FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
Faizan Farooq Khan
Yousef Radwan
Eslam Abdelrahman
Abdulwahab Felemban
Aymen Mir
Nico K. Michiels
Andrew J. Temple
M. Berumen
Mohamed Elhoseiny
157
0
0
29 Sep 2025
Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
Kuanrong Liu
Siyuan Liang
Cheng Qian
Ming Zhang
Xiaochun Cao
AAMLVLM
148
0
0
28 Sep 2025
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
Siheng Wang
Zhengdao Li
Yanshu Li
Canran Xiao
Haibo Zhan
...
Zhikang Dong
Jifeng Shen
Junhao Dong
Qiang Sun
Piotr Koniusz
ObjDVLM
327
6
0
27 Sep 2025
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
Abdul Monaf Chowdhury
Akm Moshiur Rahman Mazumder
Rabeya Akter
S. Arib
LM&Ro
179
1
0
27 Sep 2025
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Y Samuel Wang
Zeyu Xue
Mujie Liu
Tongqin Zhang
Yan Hu
Zhou Zhao
Chenguang Yang
Zhenyu Lu
228
0
0
27 Sep 2025
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
Vahid Mirjalili
Ramin Giahi
Sriram Kollipara
Akshay Kekuda
Kehui Yao
...
Kaushiki Nag
Sinduja Subramaniam
Topojoy Biswas
Evren Körpeoglu
Kannan Achan
VLMLRM
135
0
0
26 Sep 2025
Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning
Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning
Xun Li
Rodrigo Santa Cruz
Mingze Xi
Hu Zhang
Madhawa Perera
...
Brandon J. Matthews
Feng Xu
Matt Adcock
Dadong Wang
Jiajun Liu
163
2
0
24 Sep 2025
Knowledge Transfer from Interaction Learning
Knowledge Transfer from Interaction Learning
Yilin Gao
Kangyi Chen
Zhongxing Peng
Hengjie Lu
Shugong Xu
VLM
164
1
0
23 Sep 2025
COLA: Context-aware Language-driven Test-time Adaptation
COLA: Context-aware Language-driven Test-time AdaptationIEEE Transactions on Image Processing (IEEE TIP), 2025
Aiming Zhang
Tianyuan Yu
Liang Bai
Jun Tang
Yanming Guo
Yirun Ruan
Yun Zhou
Zhihe Lu
TTAVLM
308
0
0
22 Sep 2025
MVP: Motion Vector Propagation for Zero-Shot Video Object Detection
MVP: Motion Vector Propagation for Zero-Shot Video Object Detection
Binhua Huang
Ni Wang
Wendong Yao
Soumyabrata Dev
ObjDVLM
154
0
0
22 Sep 2025
Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation
Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation
Silvio Mazzucco
Carl Persson
Mattia Segu
Pier Luigi Dovesi
Federico Tombari
Luc Van Gool
Matteo Poggi
VLM
318
1
0
18 Sep 2025
MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment
MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment
Elena Camuffo
F. Barbato
Mete Ozay
Simone Milani
Umberto Michieli
ObjD
387
1
0
17 Sep 2025
When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection
When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection
Rabin Dulal
Lihong Zheng
M. A. Kabir
139
1
0
08 Sep 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
194
0
0
08 Sep 2025
AttriPrompt: Dynamic Prompt Composition Learning for CLIP
AttriPrompt: Dynamic Prompt Composition Learning for CLIP
Qiqi Zhan
Shiwei Li
Qingjie Liu
Yunhong Wang
VLM
184
2
0
07 Sep 2025
1234...131415
Next
Page 1 of 15
Pageof 15