ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.18369
334
1
v1v2v3v4 (latest)

RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models

23 June 2025
Yeongtak Oh
J. Mok
Juhyeon Shin
Juhyeon Shin
Sangha Park
J. Mok
Sungroh Yoon
    VLM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github (7★)
Main:10 Pages
31 Figures
Bibliography:4 Pages
18 Tables
Appendix:17 Pages
Abstract

Recent multi-modal large language models (MLLMs) often struggle to generate personalized image captions, even when trained on high-quality captions. In this work, we observe that such limitations persist in existing post-training-based MLLM personalization methods. Specifically, despite being post-tuned with large-scale caption data through supervised fine-tuning (SFT), these models frequently fail to produce faithful descriptions in real-world scenarios, such as multi-concept image captioning. However, acquiring large-scale, high-quality captions for such complex settings is both costly and difficult. To address the data-centric nature of SFT, we propose a reinforcement learning (RL)-based post-training framework. To the best of our knowledge, this is the first RL-based approach to post-train MLLMs for personalized image captioning. Our method significantly enhances both visual recognition and personalized generation capabilities of MLLMs, and consistently outperforms existing SFT-based baselines, especially in the challenging multi-concept image captioning task.

View on arXiv
Comments on this paper