ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.02605
  4. Cited By
Detecting Twenty-thousand Classes using Image-level Supervision
v1v2v3 (latest)

Detecting Twenty-thousand Classes using Image-level Supervision

European Conference on Computer Vision (ECCV), 2022
7 January 2022
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (1950★)

Papers citing "Detecting Twenty-thousand Classes using Image-level Supervision"

50 / 520 papers shown
HELIOS: Hierarchical Exploration for Language-Grounded Interaction in Open Scenes
HELIOS: Hierarchical Exploration for Language-Grounded Interaction in Open Scenes
Katrina Ashton
Chahyon Ku
Shrey Shah
W. Jiang
Kostas Daniilidis
Bernadette Bucher
Kostas Daniilidis
Bernadette Bucher
LM&Ro
163
0
0
30 Mar 2026
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Lorenzo Bianchi
Giacomo Pacini
F. Carrara
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
3DVVLM
223
1
0
30 Mar 2026
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
Chengyang He
Ge Sun
Yue Bai
Junkai Lu
Jiadong Zhao
Guillaume Sartoretti
204
1
0
04 Dec 2025
OpenBox: Annotate Any Bounding Boxes in 3D
OpenBox: Annotate Any Bounding Boxes in 3D
In-Jae Lee
Mungyeom Kim
Kwonyoung Ryu
Pierre Musacchio
Jaesik Park
3DPC
153
2
0
01 Dec 2025
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
Silin Cheng
Kai Han
MLLMVPVLMVLM
349
3
0
27 Nov 2025
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
Chujie Wang
Jianyu Lu
Zhiyuan Luo
Xi Chen
Chu He
LM&Ro
301
0
0
26 Nov 2025
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
179
2
0
25 Nov 2025
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
Tooba Tehreem Sheikh
Jean Lahoud
Rao Muhammad Anwer
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
ObjDMedImVLM
379
1
0
25 Nov 2025
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
Jiaying Zhou
Qingchao Chen
157
0
0
22 Nov 2025
ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation
ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation
Simon Boeder
Fabian Gigengack
Simon Roesler
Holger Caesar
Benjamin Risse
137
2
0
19 Nov 2025
GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding
GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding
Athul M. Mathew
Haithem Hermassi
Thariq Khalid
Arshad Ali Khan
R. Souissi
164
1
0
09 Nov 2025
IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction
IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based CorrectionIEEE Transactions on Artificial Intelligence (IEEE TAI), 2025
Ankan Mullick
Sukannya Purkayastha
Saransh Sharma
Pawan Goyal
Niloy Ganguly
184
0
0
08 Nov 2025
HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
Zecheng Yin
H. Vicky Zhao
Zhen Li
186
0
0
27 Oct 2025
Towards 3D Objectness Learning in an Open World
Towards 3D Objectness Learning in an Open World
Taichi Liu
Zhenyu Wang
Ruofeng Liu
Guang Wang
Desheng Zhang
3DPCVLM
216
0
0
20 Oct 2025
Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions
Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions
Jihoon Kwon
Kyle Min
Jy-yong Sohn
CoGe
212
1
0
18 Oct 2025
CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection
CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjDLRM
451
1
0
16 Oct 2025
Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity
Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity
MingZe Tang
Jubal Chandy Jacob
VLM
156
1
0
15 Oct 2025
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Weikai Huang
Jieyu Zhang
Taoyang Jia
Chenhao Zheng
Ziqi Gao
J. S. Park
Winson Han
Ranjay Krishna
291
0
0
10 Oct 2025
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025
Kento Kawaharazuka
Jihoon Oh
Jun Yamada
Ingmar Posner
Yuke Zhu
LM&Ro
416
59
0
08 Oct 2025
Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition
Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition
Ranjan Sapkota
Manoj Karkee
ObjDMU
335
20
0
06 Oct 2025
General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks
General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks
Fahim Shahriar
Cheryl Wang
Alireza Azimi
Gautham Vasan
Hany Hamed Elanwar
A. Rupam Mahmood
Colin Bellinger
180
0
0
06 Oct 2025
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Jyoti Kini
Rohit Gupta
Mubarak Shah
ObjDVLM
275
1
0
04 Oct 2025
VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors
VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors
Atif Belal
H. R. Medeiros
M. Pedersoli
Eric Granger
ObjDVLMTTA
165
1
0
01 Oct 2025
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
Siheng Wang
Zhengdao Li
Yanshu Li
Canran Xiao
Haibo Zhan
...
Zhikang Dong
Jifeng Shen
Junhao Dong
Qiang Sun
Piotr Koniusz
ObjDVLM
336
6
0
27 Sep 2025
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
Vahid Mirjalili
Ramin Giahi
Sriram Kollipara
Akshay Kekuda
Kehui Yao
...
Kaushiki Nag
Sinduja Subramaniam
Topojoy Biswas
Evren Körpeoglu
Kannan Achan
VLMLRM
135
0
0
26 Sep 2025
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
Debargha Ganguly
Sumit Kumar
Ishwar B Balappanawar
Weicong Chen
Shashank Kambhatla
Srinivasan Iyengar
Shivkumar Kalyanaraman
Ponnurangam Kumaraguru
Vipin Chaudhary
VLM
244
2
0
26 Sep 2025
MVP: Motion Vector Propagation for Zero-Shot Video Object Detection
MVP: Motion Vector Propagation for Zero-Shot Video Object Detection
Binhua Huang
Ni Wang
Wendong Yao
Soumyabrata Dev
ObjDVLM
156
0
0
22 Sep 2025
Sparse Multiview Open-Vocabulary 3D Detection
Sparse Multiview Open-Vocabulary 3D Detection
Olivier Moliner
Viktor Larsson
Kalle Åström
152
0
0
19 Sep 2025
Pre-Manipulation Alignment Prediction with Parallel Deep State-Space and Transformer Models
Pre-Manipulation Alignment Prediction with Parallel Deep State-Space and Transformer Models
Motonari Kambara
Komei Sugiura
273
0
0
17 Sep 2025
Toward Ownership Understanding of Objects: Active Question Generation with Large Language Model and Probabilistic Generative Model
Toward Ownership Understanding of Objects: Active Question Generation with Large Language Model and Probabilistic Generative Model
Saki Hashimoto
Shoichi Hasegawa
Tomochika Ishikawa
Akira Taniguchi
Y. Hagiwara
Lotfi El Hafi
Tadahiro Taniguchi
LM&Ro
153
0
0
16 Sep 2025
GBPP: Grasp-Aware Base Placement Prediction for Robots via Two-Stage Learning
GBPP: Grasp-Aware Base Placement Prediction for Robots via Two-Stage Learning
Jizhuo Chen
Diwen Liu
Jiaming Wang
Harold Soh
186
0
0
15 Sep 2025
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota
Ryo Hachiuma
Boyi Li
Ximing Lu
Michael Ross Boone
...
Marco Pavone
Yu-Chun Wang
Noa Garcia
Yuta Nakashima
Chao-Han Huck Yang
236
2
0
09 Sep 2025
Harnessing Object Grounding for Time-Sensitive Video Understanding
Harnessing Object Grounding for Time-Sensitive Video Understanding
Tz-Ying Wu
S. N. Sridhar
Subarna Tripathi
248
0
0
08 Sep 2025
Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting
Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting
Sen Wang
Kunyi Li
Siyun Liang
Elena Alegret
Jing Ma
Nassir Navab
Stefano Gasperini
3DGS
177
1
0
05 Sep 2025
OVGrasp: Open-Vocabulary Grasping Assistance via Multimodal Intent Detection
OVGrasp: Open-Vocabulary Grasping Assistance via Multimodal Intent Detection
Chen Hu
Shan Luo
Letizia Gionfrida
129
0
0
04 Sep 2025
InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System
InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System
Xianbao Hou
Yonghao He
Zeyd Boukhers
John See
Hu Su
Wei Sui
Cong Yang
DiffMVLM
177
1
0
03 Sep 2025
GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
Kei Katsumata
Yui Iioka
Naoki Hosomi
Teruhisa Misu
Kentaro Yamada
K. Sugiura
174
0
0
28 Aug 2025
Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots
Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots
Sena Ishii
A. Chikhalikar
Ankit A. Ravankar
J. V. S. Luces
Y. Hirata
188
1
0
27 Aug 2025
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
Peng-Hao Hsu
Ke Zhang
Fu-En Wang
Tao Tu
Ming-feng Li
Yu-Lun Liu
Albert Y. C. Chen
Min Sun
Cheng-Hao Kuo
3DPCVLM
154
6
0
27 Aug 2025
Robust and Label-Efficient Deep Waste Detection
Robust and Label-Efficient Deep Waste Detection
Hassan Abid
Khan Muhammad
M. H. Khan
HAIVLM
176
0
0
26 Aug 2025
Take That for Me: Multimodal Exophora Resolution with Interactive Questioning for Ambiguous Out-of-View Instructions
Take That for Me: Multimodal Exophora Resolution with Interactive Questioning for Ambiguous Out-of-View Instructions
Akira Oyama
Shoichi Hasegawa
Akira Taniguchi
Y. Hagiwara
Tadahiro Taniguchi
129
2
0
22 Aug 2025
Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception
Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception
Junjie Wang
Keyu Chen
Yulin Li
Bin Chen
Hengshuang Zhao
Xiaojuan Qi
Zhuotao Tian
CLIPVLM
176
4
0
15 Aug 2025
Composed Object Retrieval: Object-level Retrieval via Composed Expressions
Composed Object Retrieval: Object-level Retrieval via Composed Expressions
Tong Wang
Guanyu Yang
Nian Liu
Zongyan Han
Jinxing Zhou
Salman Khan
Fahad Shahbaz Khan
267
0
0
06 Aug 2025
Weakly-Supervised Image Forgery Localization via Vision-Language Collaborative Reasoning Framework
Weakly-Supervised Image Forgery Localization via Vision-Language Collaborative Reasoning Framework
Ziqi Sheng
Junyan Wu
Wei Lu
Jiantao Zhou
WSOL
421
2
0
02 Aug 2025
ODOV: Towards Open-Domain Open-Vocabulary Object Detection
ODOV: Towards Open-Domain Open-Vocabulary Object Detection
Yupeng Zhang
Ruize Han
Fangnan Zhou
Song Wang
Wei Feng
Liang Wan
ObjDVLM
260
1
0
02 Aug 2025
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang
Luigi Piccinelli
Mattia Segu
Siyuan Li
Rui Huang
Yuqian Fu
Marc Pollefeys
Hermann Blum
Z. Bauer
3DPC
310
8
0
31 Jul 2025
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation
Sanghun Jung
Jingjing Zheng
Ke Zhang
Nan Qiao
Albert Y. C. Chen
...
Xiao Zeng
Hsiang-Wei Huang
Byron Boots
Min Sun
Cheng-Hao Kuo
299
4
0
30 Jul 2025
When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
Matin Aghaei
Lingfeng Zhang
Mohammad Ali Alomrani
Mahdi Biparva
Yingxue Zhang
LRM
190
0
0
26 Jul 2025
Open World Object Detection: A Survey
Open World Object Detection: A Survey
Yiming Li
Yi Wang
Wenqian Wang
Dan Lin
Bingbing Li
Kim-Hui Yap
ObjD
479
27
0
01 Jul 2025
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
Puhao Li
Yingying Wu
Ziheng Xi
Wanlin Li
Yuzhe Huang
...
Yinghan Chen
Jianan Wang
Song-Chun Zhu
Tengyu Liu
Siyuan Huang
LM&Ro
253
26
0
19 Jun 2025
1234...91011
Next
Page 1 of 11
Pageof 11