MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

13 February 2025

ArXiv (abs)PDF HTML HuggingFace (28 upvotes)Github

Papers citing "MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency"

50 / 70 papers shown

When to Think and When to Look: Uncertainty-Guided Lookback

...

344

30 Mar 2026

Reinforcement Learning for Large Model: A Survey

408

24 Dec 2025

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

...

202

04 Dec 2025

Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens

207

30 Nov 2025

AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

...

Jianxi Huang

Juepeng Zheng

LRM

205

28 Nov 2025

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

147

27 Nov 2025

Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

...

277

26 Nov 2025

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

...

378

19 Nov 2025

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

529

17 Nov 2025

Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation

131

01 Nov 2025

MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models

151

31 Oct 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

241

30 Oct 2025

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

440

30 Oct 2025

PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection

575

27 Oct 2025

S-Chain: Structured Visual Chain-of-Thought For Medicine

...

175

26 Oct 2025

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

525

24 Oct 2025

What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation

281

23 Oct 2025

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

302

13 Oct 2025

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

...

241

12 Oct 2025

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

184

10 Oct 2025

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

...

147

30 Sep 2025

Visual CoT Makes VLMs Smarter but More Fragile

177

28 Sep 2025

Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation

401

23 Sep 2025

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning

...

275

21 Sep 2025

DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training

Qi Cao

P. Xie

OffRL

202

05 Sep 2025

MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models

176

19 Aug 2025

RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning

439

17 Aug 2025

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

...

281

13 Aug 2025

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models

...

270

11 Aug 2025

Humans Perceive Wrong Narratives from AI Reasoning Texts

Mosh Levy

Zohar Elyoseph

Yoav Goldberg

226

09 Aug 2025

ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges

112

06 Aug 2025

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

...

205

06 Aug 2025

Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations

201

27 Jul 2025

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

...

201

25 Jun 2025

MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models

185

24 Jun 2025

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

191

09 Jun 2025

Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT

317

30 May 2025

Reinforcing Video Reasoning with Focused Thinking

410

30 May 2025

THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models

286

28 May 2025

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Qi Cao

Ruiyi Wang

Ruiyi Zhang

Sai Ashish Somayajula

P. Xie

LRM

479

26 May 2025

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

489

24 May 2025

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

660

24 May 2025

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

364

22 May 2025

Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning

229

22 May 2025

ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations

...

Hairong Dong

Dingkang Yang

LRM

471

20 May 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

536

121

01 May 2025

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

673

30 Apr 2025

GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets

352

28 Apr 2025

TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

...

488

22 Apr 2025

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark

529

20 Apr 2025