Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2501.03230
Cited By
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
International Conference on Machine Learning (ICML), 2024
8 January 2025
Hao Fei
Shengqiong Wu
Wei Ji
Hao Zhang
Hao Fei
Yang Deng
Wynne Hsu
LRM
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
50 / 91 papers shown
Title
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Wenxin Zhu
Andong Chen
Yuchen Song
Kehai Chen
Conghui Zhu
Ziyan Chen
Tiejun Zhao
LRM
398
0
0
17 Nov 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
161
0
0
11 Nov 2025
CoCoVa: Chain of Continuous Vision-Language Thought for Latent Space Reasoning
Jizheng Ma
Xiaofei Zhou
Yanlong Song
Han Yan
VLM
LRM
153
1
0
04 Nov 2025
StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
Yuhang Hu
Zhenyu Yang
S. S. Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao
Changsheng Xu
VGen
LRM
124
0
0
29 Oct 2025
Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection
Cui Yakun
Fushuo Huo
Weijie Shi
Juntao Dai
Hang Du
Z. Zhu
Sirui Han
Yike Guo
66
0
0
28 Oct 2025
MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection
Anisha Saha
Varsha Suresh
Timothy Hospedales
Vera Demberg
LRM
61
0
0
27 Oct 2025
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
Shijian Wang
Jiarui Jin
Xingjian Wang
L. Song
Runhao Fu
H. Wang
Zongyuan Ge
Yuan Lu
Xuelian Cheng
ReLM
LRM
100
5
0
27 Oct 2025
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
Xiongkun Linghu
Jiangyong Huang
Ziyu Zhu
Baoxiong Jia
Siyuan Huang
LRM
113
1
0
19 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
68
0
0
17 Oct 2025
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
M. Luo
Zihui Xue
Alex Dimakis
Kristen Grauman
VGen
LRM
228
4
0
07 Oct 2025
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
Jianxin Liang
Tan Yue
Yuxuan Wang
Yueqian Wang
Zhihan Yin
Huishuai Zhang
Dongyan Zhao
80
0
0
29 Sep 2025
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning
Sicheng Tao
Jia-Chen Gu
Yibo Yan
Junyan Zhang
Yubo Gao
...
Shuhang Xun
Yuxuan Fan
Hong Chen
Jianxiang He
Xuming Hu
LRM
308
4
0
25 Sep 2025
Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning
Guoxin Wang
Jun Zhao
Xinyi Liu
Yanbo Liu
Xuyang Cao
...
Zhuoyun Liu
Qintian Sun
Fangru Zhou
Haoqiang Xing
Zhenhong Yang
LRM
150
1
0
23 Sep 2025
LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection
Lanhu Wu
Zilin Gao
Hao Fei
Mong-Li Lee
Wynne Hsu
Mamba
148
0
0
23 Sep 2025
3D Aware Region Prompted Vision Language Model
A. Cheng
Yang Fu
Yukang Chen
Zhijian Liu
X. Li
...
Jan Kautz
Pavlo Molchanov
Hongxu Yin
Xiaolong Wang
Sifei Liu
115
7
0
16 Sep 2025
FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning
Haodong Chen
Haojian Huang
XinXiang Yin
Dian Shao
LRM
127
2
0
15 Sep 2025
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
Meng Luo
Shengqiong Wu
Liqiang Jing
Tianjie Ju
Li Zheng
...
Jiebo Luo
William Yang Wang
Hao Fei
Yang Deng
Wynne Hsu
144
1
0
15 Sep 2025
AdsQA: Towards Advertisement Video Understanding
Xinwei Long
Kai Tian
Peng Xu
Guoli Jia
Jingxuan Li
...
Che Jiang
Hao Xu
Yang Liu
Jiaheng Ma
Bowen Zhou
104
2
0
10 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang
Yongcan Yu
Jian Liang
Ran He
HILM
LRM
189
4
0
04 Sep 2025
Why Do MLLMs Struggle with Spatial Understanding? A Systematic Analysis from Data to Architecture
Wanyue Zhang
Yibin Huang
Yangbin Xu
JingJing Huang
Helu Zhi
Shuo Ren
Wang Xu
Jiajun Zhang
LRM
84
11
0
02 Sep 2025
ProPy: Building Interactive Prompt Pyramids upon CLIP for Partially Relevant Video Retrieval
Yi Pan
Yujia Zhang
Michael C. Kampffmeyer
Xiaoguang Zhao
84
0
0
26 Aug 2025
See What You Need: Query-Aware Visual Intelligence through Reasoning-Perception Loops
Zixuan Dong
Baoyun Peng
Y. Wang
Lin Liu
Xinxin Dong
Yunlong Cao
Xiaodong Wang
LRM
52
1
0
25 Aug 2025
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
Ashish Seth
Utkarsh Tyagi
Ramaneswaran Selvakumar
Nishit Anand
Sonal Kumar
Sreyan Ghosh
R. Duraiswami
Chirag Agarwal
Dinesh Manocha
MLLM
HILM
VLM
196
1
0
18 Aug 2025
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Wenbin An
Jiahao Nie
Yaqiang Wu
Feng Tian
Shijian Lu
Q. Zheng
MLLM
162
1
0
14 Aug 2025
Episodic Memory Representation for Long-form Video Understanding
Yun Wang
Long Zhang
Jingren Liu
Jiaqi Yan
Zhanjie Zhang
Jiahao Zheng
Xun Yang
Dapeng Wu
Xiangyu Chen
Xuelong Li
112
3
0
13 Aug 2025
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
H. Zhang
Xin Gu
Jiawen Li
Chixiang Ma
Sule Bai
Chubin Zhang
Bowen Zhang
Zhichao Zhou
Dongliang He
Yansong Tang
OffRL
LRM
149
22
0
06 Aug 2025
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
Haolin Yang
Feilong Tang
Linxiao Zhao
Xiang An
Ming Hu
...
Yifan Lu
Xiaofeng Zhang
Abdalla Swikir
Junjun He
Zongyuan Ge
271
2
0
03 Aug 2025
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
Wentao Zhang
CML
ELM
LRM
229
3
0
22 Jul 2025
LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering
Xinxin Dong
Baoyun Peng
H. Ma
Y. Wang
Zixuan Dong
Fei Hu
Xiaodong Wang
141
0
0
20 Jul 2025
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Ziyang Wang
Jaehong Yoon
Shoubin Yu
Md. Mohaiminul Islam
Gedas Bertasius
Mohit Bansal
OffRL
LRM
199
5
0
09 Jul 2025
Cautious Next Token Prediction
Yizhou Wang
Lingzhi Zhang
Yue Bai
M. Chiu
Zhengmian Hu
M. Zhang
Qihua Dong
Yu Yin
Sohrab Amirghodsi
Y. Fu
166
2
0
03 Jul 2025
Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search
Haoran Sun
Yankai Jiang
Wenjie Lou
Yujie Zhang
Wenjie Li
Lilong Wang
Mianxin Liu
Lei Liu
Xiaosong Wang
LRM
263
4
0
20 Jun 2025
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
Yifeng Gao
Yifan Ding
Hongyu Su
Juncheng Li
Yunhan Zhao
...
Li Wang
Xin Wang
Yixu Wang
Jiabo He
Yu-Gang Jiang
VGen
287
1
0
13 Jun 2025
VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?
Jiachen Yu
Yufei Zhan
Ziheng Wu
Yousong Zhu
Jinqiao Wang
Minghui Qiu
VLM
LRM
130
2
0
13 Jun 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
387
4
0
12 Jun 2025
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
Wendong Bu
Yang Wu
Qifan Yu
Minghe Gao
Bingchen Miao
...
Mengze Li
Wei Ji
Juncheng Billy Li
Siliang Tang
Yueting Zhuang
ELM
137
1
0
10 Jun 2025
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Ruiyang Zhang
Hu Zhang
Hao Fei
Zhedong Zheng
UQCV
226
0
0
09 Jun 2025
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
Daeun Lee
Jaehong Yoon
Jaemin Cho
Mohit Bansal
LRM
288
2
0
04 Jun 2025
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Yang Yao
Lingyu Li
Jiaxin Song
Chiyu Chen
Zhenqi He
...
Xin Wang
Tianle Gu
Jie Li
Yan Teng
Yingchun Wang
LRM
254
0
0
03 Jun 2025
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Sara Ghazanfari
Francesco Croce
Nicolas Flammarion
Prashanth Krishnamurthy
Farshad Khorrami
S. Garg
LRM
145
8
0
31 May 2025
SiLVR: A Simple Language-based Video Reasoning Framework
Ce Zhang
Yan-Bo Lin
Ziyang Wang
Mohit Bansal
Gedas Bertasius
LRM
150
7
0
30 May 2025
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation
Tony Montes
Fernando Lozano
254
2
0
21 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
303
2
0
20 May 2025
CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
Hongbo Jin
Ruyang Liu
Wenhao Zhang
Guibo Luo
Ge Li
LRM
302
1
0
17 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
952
1
0
06 May 2025
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani
Sachit Menon
Ahmet Iscen
Shyamal Buch
Ramin Mehran
...
Yukun Zhu
Carl Vondrick
Mikhail Sirotenko
Cordelia Schmid
Tobias Weyand
289
8
0
01 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
493
18
0
30 Apr 2025
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning
Baining Zhao
Liang Luo
Jianjie Fang
Chen Gao
Fanhang Man
Jinqiang Cui
Xin Wang
Xinlei Chen
Yong Li
Wenwu Zhu
LM&Ro
VLM
LRM
287
24
0
17 Apr 2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
Haojian Huang
Haodong Chen
Shengqiong Wu
Meng Luo
Jinlan Fu
Xinya Du
Hao Zhang
Hao Fei
AI4TS
890
8
0
17 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
816
0
0
07 Apr 2025
1
2
Next