ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.10997
38
0

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

14 March 2025
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
ArXivPDFHTML
Abstract

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at:this https URL

View on arXiv
@article{ramakrishnan2025_2503.10997,
  title={ RONA: Pragmatically Diverse Image Captioning with Coherence Relations },
  author={ Aashish Anantha Ramakrishnan and Aadarsh Anantha Ramakrishnan and Dongwon Lee },
  journal={arXiv preprint arXiv:2503.10997},
  year={ 2025 }
}
Comments on this paper