ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.17270
  4. Cited By
YOLO-World: Real-Time Open-Vocabulary Object Detection

YOLO-World: Real-Time Open-Vocabulary Object Detection

30 January 2024
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
    VLM
    ObjD
ArXivPDFHTML

Papers citing "YOLO-World: Real-Time Open-Vocabulary Object Detection"

43 / 43 papers shown
Title
Real-Time Privacy Preservation for Robot Visual Perception
Real-Time Privacy Preservation for Robot Visual Perception
Minkyu Choi
Yunhao Yang
N. Bhatt
Kushagra Gupta
Sahil Shah
Aditya Rai
David Fridovich-Keil
Ufuk Topcu
Sandeep P. Chinchali
25
0
0
08 May 2025
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
43
0
0
06 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
40
0
0
06 May 2025
Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Zengli Luo
Canlong Zhang
Xiaochun Lu
Zhixin Li
Zhiwen Wang
24
0
0
06 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
57
0
0
01 May 2025
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
Y. Li
Lu Si
Y. T. Hou
Chengaung Liu
B. Li
Hongjian Fang
J. Zhang
71
0
0
30 Apr 2025
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Patrick Müller
Alexander Braun
M. Keuper
50
0
0
25 Apr 2025
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
56
0
0
25 Apr 2025
A Decade of You Only Look Once (YOLO) for Object Detection
A Decade of You Only Look Once (YOLO) for Object Detection
Leo Thomas Ramos
Angel D. Sappa
61
0
0
24 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
40
0
0
10 Apr 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
80
0
0
11 Mar 2025
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang
Yongchao Feng
Shuai Yang
Z. Liu
Qingjie Liu
Y. Wang
ObjD
81
0
0
08 Mar 2025
Robust Computer-Vision based Construction Site Detection for Assistive-Technology Applications
Junchi Feng
Giles Hamilton-Fletcher
Nikhil Ballem
Michael Batavia
Yifei Wang
Jiuling Zhong
Maurizio Porfiri
John-Ross Rizzo
45
0
0
06 Mar 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
55
7
0
24 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
51
0
0
23 Feb 2025
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Abdalwhab Abdalwhab
A. Imran
Sina Heydarian
I. Iordanova
David St-Onge
41
0
0
16 Jan 2025
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Wen-Dong Jiang
Chih-Yung Chang
Diptendu Sinha Roy
36
0
0
07 Jan 2025
RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba
RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba
Andong Lu
Wanyu Wang
Chenglong Li
Jin Tang
B. Luo
Mamba
49
2
0
31 Dec 2024
AI-Powered Urban Transportation Digital Twin: Methods and Applications
AI-Powered Urban Transportation Digital Twin: Methods and Applications
Xuan Di
Yongjie Fu
Mehmet K.Turkcan
Mahshid Ghasemi
Zhaobin Mo
Chengbo Zang
Abhishek Adhikari
Z. Kostić
Gil Zussman
AI4CE
29
0
0
30 Dec 2024
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
114
4
0
29 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
91
1
0
27 Nov 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Hua Zhang
Xiaochun Cao
FAtt
82
4
0
25 Nov 2024
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
Cheng-Chun Hsu
Bowen Wen
Jie Xu
Yashraj S. Narang
Xiaolong Wang
Yuke Zhu
Joydeep Biswas
Stan Birchfield
DiffM
32
8
0
01 Nov 2024
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Hao-Tang Tsui
Chien-Yao Wang
H. Liao
ObjD
VLM
39
0
0
20 Oct 2024
Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition
Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition
T. Do
Dinh Phu Tran
An Vo
Daeyoung Kim
24
0
0
17 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator
  through Scene Imagination
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
32
2
0
13 Oct 2024
Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking
Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking
Ayesha Ishaq
Mohamed El Amine Boudjoghra
Jean Lahoud
F. Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
47
1
0
02 Oct 2024
OW-Rep: Open World Object Detection with Instance Representation Learning
OW-Rep: Open World Object Detection with Instance Representation Learning
Sunoh Lee
Minsik Jeon
Jihong Min
Junwon Seo
ObjD
65
0
0
24 Sep 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
34
0
0
07 Sep 2024
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Hao Ding
Tuxun Lu
Yuqian Zhang
Ruixing Liang
Hongchao Shu
...
Bo Wang
Marcos Fernández-Rodríguez
Estevao Lima
João L. Vilaça
Mathias Unberath
55
4
0
16 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
58
4
0
09 Jul 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
F. Khan
VLM
ISeg
68
6
0
04 Jun 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
48
4
0
28 May 2024
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
45
1
0
24 Mar 2024
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
39
46
0
01 Sep 2023
Virtual Guidance as a Mid-level Representation for Navigation with Augmented Reality
Virtual Guidance as a Mid-level Representation for Navigation with Augmented Reality
Hsuan-Kung Yang
Tsung-Chih Chiang
Tingxin Liu
Chun-Wei Huang
Jou-Min Liu
Tsu-Ching Hsiao
Chun-Yi Lee
13
1
0
05 Mar 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
  Open-world Detection
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
115
151
0
20 Sep 2022
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
223
897
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
RepVGG: Making VGG-style ConvNets Great Again
RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding
X. Zhang
Ningning Ma
Jungong Han
Guiguang Ding
Jian-jun Sun
117
1,484
0
11 Jan 2021
PP-YOLO: An Effective and Efficient Implementation of Object Detector
PP-YOLO: An Effective and Efficient Implementation of Object Detector
Xiang Long
Kaipeng Deng
Guanzhong Wang
Yan Zhang
Qingqing Dang
...
Hui Shen
Jianguo Ren
Shumin Han
Errui Ding
Shilei Wen
ObjD
46
268
0
23 Jul 2020
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
166
21,643
0
09 Dec 2016
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
281
35,677
0
08 Jun 2015
1