Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge. However, whether these MLLMs possess human-like compositional reasoning abilities remains an open problem. To unveil their reasoning behaviors, we first curate a \textbf{M}ultimodal \textbf{A}ssumptive \textbf{R}ea\textbf{s}oning Benchmark (MARS-Bench) in this paper. Interestingly, we find that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question, whereas such presuppositions appear naive to human reasoning. Besides, we also propose a simple yet effective method, Active Deduction (AD), a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction before reaching a final decision. Equipped with the proposed AD method, a MLLM demonstrates significant improvements in assumptive reasoning abilities without compromising its general-purpose question-answering performance. We also provide extensive evaluations of both open-source and private MLLMs on MARS-Bench, along with experimental analyses of the AD method.
View on arXiv@article{li2025_2404.12966, title={ Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning }, author={ Yian Li and Wentao Tian and Yang Jiao and Jingjing Chen and Tianwen Qian and Bin Zhu and Na Zhao and Yu-Gang Jiang }, journal={arXiv preprint arXiv:2404.12966}, year={ 2025 } }