Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.05255
Cited By
v1
v2 (latest)
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
7 March 2025
Guanghao Zhang
Tao Zhong
Yan Xia
Zhelun Yu
Haoyang Li
Wanggui He
Fangxun Shu
Mushui Liu
D. She
Yi Wang
Hao Jiang
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation"
10 / 10 papers shown
Title
DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning
Eddison Pham
Prisha Priyadarshini
Adrian Maliackel
Kanishk Bandi
Cristian Meo
Kevin Zhu
36
0
0
27 Oct 2025
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Xingang Guo
Utkarsh Tyagi
Advait Gosai
Paula Vergara
Ernesto Gabriel Hernández Montoya
...
Bin Hu
Yunzhong He
Bing Liu
Bing Liu
Rakshith S Srinivasa
VLM
LRM
208
1
0
14 Oct 2025
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
Chengqi Duan
Kaiyue Sun
Rongyao Fang
M. Zhang
Yan Feng
...
Peng Pei
Xunliang Cai
Hongsheng Li
Yi Ma
Xihui Liu
ReLM
OffRL
LRM
111
3
0
13 Oct 2025
Latent Visual Reasoning
Bangzheng Li
Ximeng Sun
Jiang-Long Liu
Ze Wang
Jialian Wu
Xiaodong Yu
Hao Chen
Emad Barsoum
Muhao Chen
Zicheng Liu
LRM
VLM
136
2
0
29 Sep 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
327
3
0
29 Sep 2025
SAIL-VL2 Technical Report
Weijie Yin
Yongjie Ye
Fangxun Shu
Yue Liao
Zijian Kang
...
Han Wang
Wenzhuo Liu
Xiao Liang
Shuicheng Yan
Chao Feng
LRM
VLM
168
2
0
17 Sep 2025
Simple o3: Towards Interleaved Vision-Language Reasoning
Ye Wang
Qianglong Chen
Zejun Li
Siyuan Wang
Shijie Guo
Zhirui Zhang
Zhongyu Wei
MLLM
LRM
VLM
120
8
0
16 Aug 2025
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Wenbin An
Jiahao Nie
Yaqiang Wu
Feng Tian
Shijian Lu
Q. Zheng
MLLM
102
0
0
14 Aug 2025
Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions
Jingxuan Wei
Caijun Jia
Qi Chen
Honghao He
Linzhuang Sun
Conghui He
Lijun Wu
Bihui Yu
Cheng Tan
LRM
118
2
0
05 Aug 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
279
2
0
20 May 2025
1