ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.07089
52
0

OmniCaptioner: One Captioner to Rule Them All

9 April 2025
Yiting Lu
Jiakang Yuan
Zhen Li
Shitian Zhao
Qi Qin
X. Li
Le Zhuo
Licheng Wen
Dongyang Liu
Yuewen Cao
Xiangchao Yan
Xin Li
Botian Shi
Tao Chen
Zhibo Chen
Lei Bai
Zhibo Chen
Peng Gao
Bo Zhang
Peng Gao
    MLLM
ArXivPDFHTML
Abstract

We propose OmniCaptioner, a versatile visual captioning framework for generating fine-grained textual descriptions across a wide variety of visual domains. Unlike prior methods limited to specific image types (e.g., natural images or geometric visuals), our framework provides a unified solution for captioning natural images, visual text (e.g., posters, UIs, textbooks), and structured visuals (e.g., documents, tables, charts). By converting low-level pixel information into semantically rich textual representations, our framework bridges the gap between visual and textual modalities. Our results highlight three key advantages: (i) Enhanced Visual Reasoning with LLMs, where long-context captions of visual modalities empower LLMs, particularly the DeepSeek-R1 series, to reason effectively in multimodal scenarios; (ii) Improved Image Generation, where detailed captions improve tasks like text-to-image generation and image transformation; and (iii) Efficient Supervised Fine-Tuning (SFT), which enables faster convergence with less data. We believe the versatility and adaptability of OmniCaptioner can offer a new perspective for bridging the gap between language and visual modalities.

View on arXiv
@article{lu2025_2504.07089,
  title={ OmniCaptioner: One Captioner to Rule Them All },
  author={ Yiting Lu and Jiakang Yuan and Zhen Li and Shitian Zhao and Qi Qin and Xinyue Li and Le Zhuo and Licheng Wen and Dongyang Liu and Yuewen Cao and Xiangchao Yan and Xin Li and Tianshuo Peng and Shufei Zhang and Botian Shi and Tao Chen and Zhibo Chen and Lei Bai and Bo Zhang and Peng Gao },
  journal={arXiv preprint arXiv:2504.07089},
  year={ 2025 }
}
Comments on this paper