ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18034
31
0

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models

23 March 2025
Qiao Liang
Yanjiang Liu
Ben He
Y. Lu
Hongyu Lin
Jia Zheng
Xianpei Han
Le Sun
Yingfei Sun
ArXivPDFHTML
Abstract

Does the prior knowledge of the vision encoder constrain the capability boundary of Multi-modal Large Language Models (MLLMs)? While most existing research treats MLLMs as unified systems optimized through end-to-end training, the impact of vision encoder's prior knowledge is seldom investigated. In this work, we introduce a novel metric, RankeRank_eRanke​, to quantify the effect of the vision encoder's prior knowledge on MLLM performance. Our analysis reveals a positive correlation between prior knowledge and MLLM performance. Moreover, we find that domain-specific fine-tuning using solely end-to-end visual question answering (VQA) data is insufficient--particularly for entities with low inherent visual prior knowledge. To address this issue, we propose VisPRE (Vision Prior Remediation), a two-stage training framework that explicitly incorporates prior knowledge at the vision encoder level. Experimental results demonstrate that augmenting vision encoder's prior knowledge substantially boosts the visual understanding capabilities of MLLMs, offering a novel and effective strategy for improving performance, especially in scenarios involving uncommon visual entities.

View on arXiv
@article{liang2025_2503.18034,
  title={ Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models },
  author={ Qiao Liang and Yanjiang Liu and Ben He and Yaojie Lu and Hongyu Lin and Jia Zheng and Xianpei Han and Le Sun and Yingfei Sun },
  journal={arXiv preprint arXiv:2503.18034},
  year={ 2025 }
}
Comments on this paper