ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.15200
  4. Cited By
Open-Set Image Tagging with Multi-Grained Text Supervision

Open-Set Image Tagging with Multi-Grained Text Supervision

23 October 2023
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
    VLM
ArXivPDFHTML

Papers citing "Open-Set Image Tagging with Multi-Grained Text Supervision"

30 / 30 papers shown
Title
On Large Multimodal Models as Open-World Image Classifiers
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti
Massimiliano Mancini
Enrico Fini
Yiming Wang
Paolo Rota
Elisa Ricci
VLM
Presented at ResearchTrend Connect | VLM on 07 May 2025
74
0
0
27 Mar 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang
Alexandros Delitzas
Fangjinhua Wang
Ruida Zhang
Xiangyang Ji
Marc Pollefeys
Francis Engelmann
3DV
3DPC
45
3
0
24 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
VLM
ObjD
66
1
0
10 Mar 2025
Detailed Object Description with Controllable Dimensions
Detailed Object Description with Controllable Dimensions
Xinran Wang
H. Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Z. Ma
Jun Guo
81
0
0
28 Nov 2024
Exploring Aleatoric Uncertainty in Object Detection via Vision
  Foundation Models
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
Peng Cui
Guande He
Dan Zhang
Zhijie Deng
Yinpeng Dong
Jun Zhu
72
0
0
26 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
93
1
0
25 Nov 2024
Efficient Online Inference of Vision Transformers by Training-Free
  Tokenization
Efficient Online Inference of Vision Transformers by Training-Free Tokenization
Leonidas Gee
Wing Yan Li
V. Sharmanska
Novi Quadrianto
ViT
85
0
0
23 Nov 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDa
VLM
MLLM
30
17
0
24 Oct 2024
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with
  Large Language Models
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
Mike Zhang
Kaixian Qu
Vaishakh Patil
César Cadena
Marco Hutter
LM&Ro
3DV
28
3
0
23 Sep 2024
Can Visual Language Models Replace OCR-Based Visual Question Answering
  Pipelines in Production? A Case Study in Retail
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail
Bianca Lamm
Janis Keuper
36
2
0
28 Aug 2024
Evaluating Attribute Comprehension in Large Vision-Language Models
Evaluating Attribute Comprehension in Large Vision-Language Models
Haiwen Zhang
Zixi Yang
Yuanzhi Liu
Xinran Wang
Zheqi He
Kongming Liang
Zhanyu Ma
ELM
21
0
0
25 Aug 2024
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li
Zhengqiang Zhang
Chenhang He
Zhiyuan Ma
Vishal M. Patel
Lei Zhang
3DV
VLM
31
5
0
13 Jul 2024
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with
  3D Semantic Maps
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu
Wenzong Ma
Zhenfu Pan
Hui Xiong
Junwei Liang
LM&Ro
22
7
0
26 Jun 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
VLM
CLIP
32
13
0
11 Jun 2024
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image
  Generative Models
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models
Mazda Moayeri
Samyadeep Basu
S. Balasubramanian
Priyatham Kattakinda
Atoosa Malemir Chegini
R. Brauneis
S. Feizi
WIGM
36
4
0
11 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency
  Determines Multimodal Model Performance
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip H. S. Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
118
43
0
04 Apr 2024
OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed
  Reality
OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality
Aditya Sharma
Luke Yoffe
Tobias Höllerer
19
8
0
17 Jan 2024
ChatGPT-Powered Hierarchical Comparisons for Image Classification
ChatGPT-Powered Hierarchical Comparisons for Image Classification
Zhiyuan Ren
Yiyang Su
Xiaoming Liu
VLM
40
21
0
01 Nov 2023
Unveiling Global Narratives: A Multilingual Twitter Dataset of News
  Media on the Russo-Ukrainian Conflict
Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict
Sherzod Hakimov
Gullal Singh Cheema
11
3
0
22 Jun 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
59
73
0
10 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion
  Models
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
201
318
0
08 Mar 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
  Open-world Detection
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
115
151
0
20 Sep 2022
What does a platypus look like? Generating customized prompts for
  zero-shot image classification
What does a platypus look like? Generating customized prompts for zero-shot image classification
Sarah M Pratt
Ian Covert
Rosanne Liu
Ali Farhadi
VLM
119
211
0
07 Sep 2022
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge
  Transfer
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Bo Ren
Shutao Xia
VLM
68
49
0
05 Jul 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
223
897
0
28 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Re-labeling ImageNet: from Single to Multi-Labels, from Global to
  Localized Labels
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun
Seong Joon Oh
Byeongho Heo
Dongyoon Han
Junsuk Choe
Sanghyuk Chun
384
139
0
13 Jan 2021
Learning Deep Representations of Fine-grained Visual Descriptions
Learning Deep Representations of Fine-grained Visual Descriptions
Scott E. Reed
Zeynep Akata
Bernt Schiele
Honglak Lee
OCL
VLM
160
841
0
17 May 2016
1