ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06805
  4. Cited By
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

10 January 2024
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
    LRM
ArXivPDFHTML

Papers citing "Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning"

50 / 61 papers shown
Title
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
J. A. Zhang
Chuanqi Cheng
Y. Liu
W. Liu
Jian Luan
Rui Yan
19
0
0
28 Apr 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
Fast-Slow Thinking for Large Vision-Language Model Reasoning
W. L. Xiao
Leilei Gan
Weilong Dai
Wanggui He
Ziwei Huang
...
Fangxun Shu
Zhelun Yu
Peng Zhang
Hao Jiang
Fei Wu
ReLM
LRM
AI4CE
68
0
0
25 Apr 2025
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Zhikai Wang
Jiashuo Sun
W. Zhang
Zhiqiang Hu
Xin Li
F. Wang
Deli Zhao
VLM
LRM
70
0
0
24 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLM
LRM
34
0
0
15 Apr 2025
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
Zeyu Zhang
Z. Chen
Zicheng Zhang
Yuze Sun
Yuan Tian
Ziheng Jia
Chunyi Li
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
MLLM
31
0
0
15 Apr 2025
Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation
Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation
Z. Li
Jinzhi Deng
Haibing Ma
Chi Zhang
Dan Xiao
20
0
0
31 Mar 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Li Lyna Zhang
Longxi Gao
Mengwei Xu
LRM
37
0
0
21 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
40
1
0
19 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
77
0
0
13 Mar 2025
Interpretable and Robust Dialogue State Tracking via Natural Language Summarization with LLMs
Rafael Carranza
Mateo Alejandro Rojas
55
0
0
11 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
86
29
0
10 Mar 2025
Fine-Grained Retrieval-Augmented Generation for Visual Question Answering
Fine-Grained Retrieval-Augmented Generation for Visual Question Answering
Zhengxuan Zhang
Yin Wu
Yuyu Luo
Nan Tang
30
0
0
28 Feb 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
95
0
0
18 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
111
9
0
05 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
57
7
0
04 Feb 2025
Position: Empowering Time Series Reasoning with Multimodal LLMs
Position: Empowering Time Series Reasoning with Multimodal LLMs
Yaxuan Kong
Yiyuan Yang
Shiyu Wang
Chenghao Liu
Yuxuan Liang
Ming Jin
Stefan Zohren
Dan Pei
Y. Liu
Qingsong Wen
AI4TS
LRM
66
2
0
03 Feb 2025
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
H. Malik
Fahad Shamshad
Muzammal Naseer
Karthik Nandakumar
F. Khan
Salman Khan
AAML
MLLM
VLM
66
0
0
03 Feb 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
The BrowserGym Ecosystem for Web Agent Research
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier De Chezelles
Maxime Gasse
Alexandre Lacoste
Alexandre Drouin
Massimo Caccia
...
Siva Reddy
Quentin Cappart
Graham Neubig
Ruslan Salakhutdinov
Nicolas Chapados
LLMAG
96
9
0
06 Dec 2024
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language
  Model Benchmarking
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking
Harsha Vardhan Khurdula
Basem Rizk
Indus Khaitan
Janit Anjaria
Aviral Srivastava
Rajvardhan Khaitan
ELM
VLM
LRM
59
0
0
20 Nov 2024
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time
  Series Anomaly Analyzers
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers
Jiaxin Zhuang
Leon Yan
Zhenwei Zhang
Ruiqi Wang
Jiawei Zhang
Yuantao Gu
AI4TS
24
7
0
04 Nov 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
D. Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
52
1
0
25 Oct 2024
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving
  Robust Reasoning in Large Language Models via Abstraction
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han
Tianqing Fang
Zhaowei Wang
Y. Song
Mark Steedman
LRM
19
0
0
15 Oct 2024
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal
  Large Language Models Via Error Detection
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Hang Li
B. Li
...
Kun Wang
Hui Xiong
Philip S. Yu
Xuming Hu
Qingsong Wen
LRM
25
13
0
06 Oct 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRM
AI4CE
23
0
0
19 Sep 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
M. Zhang
Xunliang Cai
LRM
29
6
0
21 Aug 2024
CXSimulator: A User Behavior Simulation using LLM Embeddings for
  Web-Marketing Campaign Assessment
CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment
Akira Kasuga
Ryo Yonetani
26
0
0
31 Jul 2024
CoDefeater: Using LLMs To Find Defeaters in Assurance Cases
CoDefeater: Using LLMs To Find Defeaters in Assurance Cases
Usman Gohar
Michael C. Hunter
Robyn R. Lutz
Myra B. Cohen
OffRL
16
6
0
18 Jul 2024
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen
Haibin Wang
Yilun Huang
Ce Ge
Yaliang Li
Bolin Ding
Jingren Zhou
VLM
SyDa
59
0
0
16 Jul 2024
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
50
2
0
15 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
45
5
0
11 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
46
0
0
04 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian-Yu Guan
Wei Wu
Rui Yan
LRM
35
10
0
28 Jun 2024
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi
Mehran Kazemi
Anton Tsitsulin
Karishma Malkan
Jinyeong Yim
John Palowitch
Sungyong Seo
Jonathan J. Halcrow
Bryan Perozzi
LRM
32
26
0
13 Jun 2024
What do MLLMs hear? Examining reasoning with text and sound components
  in Multimodal Large Language Models
What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models
Enis Berk Çoban
Michael I. Mandel
Johanna Devaney
AuLLM
LRM
30
0
0
07 Jun 2024
Evaluating Vision-Language Models on Bistable Images
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
31
0
0
29 May 2024
GameVLM: A Decision-making Framework for Robotic Task Planning Based on
  Visual Language Models and Zero-sum Games
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games
Aoran Mei
Jianhua Wang
Guo-Niu Zhu
Zhongxue Gan
34
6
0
22 May 2024
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
Elahe Khatibi
Mahyar Abbasian
Zhongqi Yang
Iman Azimi
Amir M. Rahmani
54
11
0
02 May 2024
PerkwE_COQA: Enhanced Persian Conversational Question Answering by
  combining contextual keyword extraction with Large Language Models
PerkwE_COQA: Enhanced Persian Conversational Question Answering by combining contextual keyword extraction with Large Language Models
Pardis Moradbeiki
Nasser Ghadiri
29
0
0
08 Apr 2024
Facial Affective Behavior Analysis with Instruction Tuning
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li
Anh Dao
Wentao Bao
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
CVBM
39
14
0
07 Apr 2024
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive
  Speech Detection via Large Language Models
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
H. Nghiem
Hal Daumé
23
1
0
18 Mar 2024
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language
  Models
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Lizhou Fan
Wenyue Hua
Xiang Li
Kaijie Zhu
Mingyu Jin
...
Haoyang Ling
Jinkui Chi
Jindong Wang
Xin Ma
Yongfeng Zhang
LRM
27
14
0
04 Mar 2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Haogeng Liu
Quanzeng You
Xiaotian Han
Yiqi Wang
Bohan Zhai
Yongfei Liu
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
MLLM
36
9
0
03 Mar 2024
Explaining latent representations of generative models with large
  multimodal models
Explaining latent representations of generative models with large multimodal models
Mengdan Zhu
Zhenke Liu
Bo Pan
Abhinav Angirekula
Liang Zhao
24
2
0
02 Feb 2024
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form
  Egocentric Videos
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
19
15
0
07 Dec 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
154
576
0
06 Apr 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
145
386
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
270
0
03 Oct 2022
12
Next