ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03514
  4. Cited By
Recognize Anything: A Strong Image Tagging Model

Recognize Anything: A Strong Image Tagging Model

6 June 2023
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
Yanchun Xie
Yuzhuo Qin
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
    VLM
ArXivPDFHTML

Papers citing "Recognize Anything: A Strong Image Tagging Model"

40 / 40 papers shown
Title
Controllable Image Colorization with Instance-aware Texts and Masks
Controllable Image Colorization with Instance-aware Texts and Masks
Yanru An
Ling Gui
Qiang Hu
Chunlei Cai
Tianxiao Ye
Xiaoyun Zhang
Yanfeng Wang
DiffM
24
0
0
13 May 2025
Split Matching for Inductive Zero-shot Semantic Segmentation
Split Matching for Inductive Zero-shot Semantic Segmentation
Jialei Chen
Xu Zheng
Dongyue Li
Chong Yi
Seigo Ito
D. Paudel
Luc Van Gool
Hiroshi Murase
Daisuke Deguchi
VLM
50
0
0
08 May 2025
TSTMotion: Training-free Scene-aware Text-to-motion Generation
TSTMotion: Training-free Scene-aware Text-to-motion Generation
Ziyan Guo
Haoxuan Qu
Hossein Rahmani
Dewen Soh
Ping Hu
Qiuhong Ke
J. Liu
VGen
61
1
0
02 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution
Aditya Arora
Z. Tu
Y. Wang
Ruizheng Bai
Jian Wang
Sizhuo Ma
DiffM
61
0
0
01 May 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
S. Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
X. Zhang
Gang Yu
Daxin Jiang
DiffM
93
3
0
24 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
X. Wang
Xiangnan He
61
0
0
22 Apr 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
121
6
0
25 Feb 2025
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
Yun Peng
Xiao Lin
Nachuan Ma
Jiayuan Du
Chuangwei Liu
Chengju Liu
Qi Chen
37
3
0
17 Feb 2025
Consistent Video Colorization via Palette Guidance
Consistent Video Colorization via Palette Guidance
Han Wang
Yuang Zhang
Yuhong Zhang
Lingxiao Lu
Li-Na Song
DiffM
VGen
86
0
0
31 Jan 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
83
0
0
25 Jan 2025
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song
Weixing Chen
Y. Liu
Weikai Chen
Guanbin Li
Liang Lin
119
3
0
12 Dec 2024
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
36
3
0
23 Oct 2024
Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
Zhijie Yan
Shufei Li
Z. Wang
Lixiu Wu
Han Wang
Jun Zhu
Lijiang Chen
Jihong Liu
32
1
0
15 Oct 2024
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Yanyuan Qiao
Wenqi Lyu
Hui Wang
Zixu Wang
Zerui Li
Yuan Zhang
Mingkui Tan
Qi Wu
LRM
36
2
0
27 Sep 2024
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Aneesh Chavan
Vaibhav Agrawal
Vineeth Bhat
Sarthak Chittawar
Siddharth Srivastava
Chetan Arora
K. M. Krishna
80
0
0
18 Sep 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
37
0
0
07 Sep 2024
Segment Any Mesh
Segment Any Mesh
George Tang
William Zhao
Logan Ford
David Benhaim
Paul Zhang
27
8
0
24 Aug 2024
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
Guofeng Mei
Luigi Riz
Yiming Wang
Fabio Poiesi
ISeg
VLM
59
3
0
20 Aug 2024
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World
  Image Super-Resolution
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution
Aiwen Jiang
Zhi Wei
Long Peng
Feiqiang Liu
Wenbo Li
Mingwen Wang
DiffM
40
2
0
24 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
38
25
0
17 Jun 2024
Visual Language Model based Cross-modal Semantic Communication Systems
Visual Language Model based Cross-modal Semantic Communication Systems
Feibo Jiang
Chuanguo Tang
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
VLM
31
2
0
06 May 2024
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Bingqian Lin
Yunshuang Nie
Ziming Wei
Jiaqi Chen
Shikui Ma
Jianhua Han
Hang Xu
Xiaojun Chang
Xiaodan Liang
LM&Ro
LRM
60
19
0
12 Mar 2024
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Yunpeng Qu
Kun Yuan
Kai Zhao
Qizhi Xie
Jinhua Hao
Ming-hui Sun
Chao Zhou
27
16
0
08 Mar 2024
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Ci-Siang Lin
Chien-Yi Wang
Yu-Chiang Frank Wang
Min-Hung Chen
VLM
21
0
0
22 Jan 2024
TransMed: Large Language Models Enhance Vision Transformer for
  Biomedical Image Classification
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Kaipeng Zheng
Weiran Huang
Lichao Sun
LM&MA
MedIm
VLM
25
0
0
12 Dec 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
17
34
0
22 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
47
103
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
24
12
0
09 Nov 2023
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
  Planning
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Yuanyi Zhong
Alihusein Kuwajerwala
Sacha Morin
Krishna Murthy Jatavallabhula
Bipasha Sen
...
Celso Miguel de Melo
Joshua B. Tenenbaum
Antonio Torralba
Florian Shkurti
Liam Paull
LM&Ro
27
166
0
28 Sep 2023
Bridging Zero-shot Object Navigation and Foundation Models through
  Pixel-Guided Navigation Skill
Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill
Wenzhe Cai
Siyuan Huang
Guangran Cheng
Yuxing Long
Peng Gao
Changyin Sun
Hao Dong
LM&Ro
19
41
0
19 Sep 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
117
0
25 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
33
106
0
17 Jul 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
61
74
0
10 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion
  Models
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
209
318
0
08 Mar 2023
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge
  Transfer
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Bo Ren
Shutao Xia
VLM
68
49
0
05 Jul 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
249
1,821
0
18 Aug 2016
1