ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22396
5
0

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

28 May 2025
Xudong Li
Mengdan Zhang
Peixian Chen
Xiawu Zheng
Yan Zhang
Jingyuan Zheng
Yunhang Shen
Ke Li
Chaoyou Fu
Xing Sun
Rongrong Ji
ArXivPDFHTML
Abstract

Multi-modal Large Language Models (MLLMs) excel at single-image tasks but struggle with multi-image understanding due to cross-modal misalignment, leading to hallucinations (context omission, conflation, and misinterpretation). Existing methods using Direct Preference Optimization (DPO) constrain optimization to a solitary image reference within the input sequence, neglecting holistic context modeling. We propose Context-to-Cue Direct Preference Optimization (CcDPO), a multi-level preference optimization framework that enhances per-image perception in multi-image settings by zooming into visual clues -- from sequential context to local details. It features: (i) Context-Level Optimization : Re-evaluates cognitive biases underlying MLLMs' multi-image context comprehension and integrates a spectrum of low-cost global sequence preferences for bias mitigation. (ii) Needle-Level Optimization : Directs attention to fine-grained visual details through region-targeted visual prompts and multimodal preference supervision. To support scalable optimization, we also construct MultiScope-42k, an automatically generated dataset with high-quality multi-level preference pairs. Experiments show that CcDPO significantly reduces hallucinations and yields consistent performance gains across general single- and multi-image tasks.

View on arXiv
@article{li2025_2505.22396,
  title={ Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs },
  author={ Xudong Li and Mengdan Zhang and Peixian Chen and Xiawu Zheng and Yan Zhang and Jingyuan Zheng and Yunhang Shen and Ke Li and Chaoyou Fu and Xing Sun and Rongrong Ji },
  journal={arXiv preprint arXiv:2505.22396},
  year={ 2025 }
}
Comments on this paper