Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.15200
Cited By
Open-Set Image Tagging with Multi-Grained Text Supervision
23 October 2023
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Open-Set Image Tagging with Multi-Grained Text Supervision"
30 / 30 papers shown
Title
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti
Massimiliano Mancini
Enrico Fini
Yiming Wang
Paolo Rota
Elisa Ricci
VLM
Presented at
ResearchTrend Connect | VLM
on
07 May 2025
72
0
0
27 Mar 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang
Alexandros Delitzas
Fangjinhua Wang
Ruida Zhang
Xiangyang Ji
Marc Pollefeys
Francis Engelmann
3DV
3DPC
45
3
0
24 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
VLM
ObjD
64
1
0
10 Mar 2025
Detailed Object Description with Controllable Dimensions
Xinran Wang
H. Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Z. Ma
Jun Guo
81
0
0
28 Nov 2024
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
Peng Cui
Guande He
Dan Zhang
Zhijie Deng
Yinpeng Dong
Jun Zhu
69
0
0
26 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
91
1
0
25 Nov 2024
Efficient Online Inference of Vision Transformers by Training-Free Tokenization
Leonidas Gee
Wing Yan Li
V. Sharmanska
Novi Quadrianto
ViT
82
0
0
23 Nov 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDa
VLM
MLLM
30
17
0
24 Oct 2024
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
Mike Zhang
Kaixian Qu
Vaishakh Patil
César Cadena
Marco Hutter
LM&Ro
3DV
25
3
0
23 Sep 2024
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail
Bianca Lamm
Janis Keuper
31
2
0
28 Aug 2024
Evaluating Attribute Comprehension in Large Vision-Language Models
Haiwen Zhang
Zixi Yang
Yuanzhi Liu
Xinran Wang
Zheqi He
Kongming Liang
Zhanyu Ma
ELM
16
0
0
25 Aug 2024
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li
Zhengqiang Zhang
Chenhang He
Zhiyuan Ma
Vishal M. Patel
Lei Zhang
3DV
VLM
29
5
0
13 Jul 2024
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu
Wenzong Ma
Zhenfu Pan
Hui Xiong
Junwei Liang
LM&Ro
22
7
0
26 Jun 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
VLM
CLIP
32
13
0
11 Jun 2024
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models
Mazda Moayeri
Samyadeep Basu
S. Balasubramanian
Priyatham Kattakinda
Atoosa Malemir Chegini
R. Brauneis
S. Feizi
WIGM
34
4
0
11 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip H. S. Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
109
43
0
04 Apr 2024
OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality
Aditya Sharma
Luke Yoffe
Tobias Höllerer
14
8
0
17 Jan 2024
ChatGPT-Powered Hierarchical Comparisons for Image Classification
Zhiyuan Ren
Yiyang Su
Xiaoming Liu
VLM
40
21
0
01 Nov 2023
Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict
Sherzod Hakimov
Gullal Singh Cheema
11
3
0
22 Jun 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
59
73
0
10 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
198
318
0
08 Mar 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
115
151
0
20 Sep 2022
What does a platypus look like? Generating customized prompts for zero-shot image classification
Sarah M Pratt
Ian Covert
Rosanne Liu
Ali Farhadi
VLM
116
211
0
07 Sep 2022
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Bo Ren
Shutao Xia
VLM
65
49
0
05 Jul 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
220
897
0
28 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun
Seong Joon Oh
Byeongho Heo
Dongyoon Han
Junsuk Choe
Sanghyuk Chun
384
139
0
13 Jan 2021
Learning Deep Representations of Fine-grained Visual Descriptions
Scott E. Reed
Zeynep Akata
Bernt Schiele
Honglak Lee
OCL
VLM
160
841
0
17 May 2016
1