Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2410.03321
Cited By

Visual-O1: Understanding Ambiguous Instructions via Multi-modal
Multi-turn Chain-of-thoughts Reasoning

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

International Conference on Learning Representations (ICLR), 2024

4 October 2024

Lei Zhang

Wangmeng Zuo

ArXiv (abs)PDF HTML Github

Papers citing "Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning"

11 / 11 papers shown

LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

...

487

2

0

03 Nov 2025

Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning

Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning

182

1

0

27 Sep 2025

Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions

Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual QuestionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

222

13

0

18 Jul 2025

VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?

VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?

287

5

0

13 Jun 2025

VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

^3

: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

419

28

0

22 May 2025

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance

396

0

0

07 Apr 2025

MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

829

28

0

23 Mar 2025

Mind with Eyes: from Language Reasoning to Multimodal Reasoning

Mind with Eyes: from Language Reasoning to Multimodal Reasoning

358

20

0

23 Mar 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

William Yang Wang

753

151

0

16 Mar 2025

Don't Let Your Robot be Harmful: Responsible Robotic Manipulation via Safety-as-Policy

Don't Let Your Robot be Harmful: Responsible Robotic Manipulation via Safety-as-PolicyIEEE Robotics and Automation Letters (RA-L), 2024

447

1

0

27 Nov 2024

AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning

AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning

...

693

18

0

18 Nov 2024

Page 1 of 1