ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11514
38
2

Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study

17 February 2025
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
    LRM
ArXivPDFHTML
Abstract

Recently, inference-time scaling of chain-of-thought (CoT) has been demonstrated as a promising approach for addressing multi-modal reasoning tasks. While existing studies have predominantly centered on text-based thinking, the integration of both visual and textual modalities within the reasoning process remains unexplored. In this study, we pioneer the exploration of inference-time scaling with multi-modal thought, aiming to bridge this gap. To provide a comprehensive analysis, we systematically investigate popular sampling-based and tree search-based inference-time scaling methods on 10 challenging tasks spanning various domains. Besides, we uniformly adopt a consistency-enhanced verifier to ensure effective guidance for both methods across different thought paradigms. Results show that multi-modal thought promotes better performance against conventional text-only thought, and blending the two types of thought fosters more diverse thinking. Despite these advantages, multi-modal thoughts necessitate higher token consumption for processing richer visual inputs, which raises concerns in practical applications. We hope that our findings on the merits and drawbacks of this research line will inspire future works in the field.

View on arXiv
@article{lin2025_2502.11514,
  title={ Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study },
  author={ Yujie Lin and Ante Wang and Moye Chen and Jingyao Liu and Hao Liu and Jinsong Su and Xinyan Xiao },
  journal={arXiv preprint arXiv:2502.11514},
  year={ 2025 }
}
Comments on this paper