Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2508.04416
Cited By
v1
v2 (latest)
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
6 August 2025
H. Zhang
Xin Gu
Jiawen Li
Chixiang Ma
Sule Bai
Chubin Zhang
Bowen Zhang
Zhichao Zhou
Dongliang He
Yansong Tang
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
15 / 15 papers shown
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
Pengfei Hu
Meng Cao
Y. Wang
Yi Wang
Jiahua Dong
Jun Song
Yu Cheng
Bo Zheng
Xiaodan Liang
LRM
VLM
137
0
0
30 Nov 2025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
H. Rasheed
Mohammed Zumri
Muhammad Maaz
Ming-Hsuan Yang
Fahad Shahbaz Khan
Salman Khan
AI4TS
LRM
164
0
0
28 Nov 2025
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
Xin Gu
H. Zhang
Qihang Fan
Jingxuan Niu
Zhipeng Zhang
Libo Zhang
G. Chen
Fan Chen
Longyin Wen
Sijie Zhu
AI4TS
LRM
327
1
0
26 Nov 2025
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
Zuhao Yang
Sudong Wang
Kaichen Zhang
Keming Wu
Sicong Leng
...
Bo Li
Chengwei Qin
Shijian Lu
X. Li
Lidong Bing
LRM
VLM
178
5
0
25 Nov 2025
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen
Zikang Wang
Zhengrong Yue
Kainan Yan
Chenyun Yu
...
Yafei Wen
Xiaoxin Chen
Yang Liu
Peng Li
Yali Wang
LLMAG
324
3
0
24 Nov 2025
VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
Fufangchen Zhao
Liao Zhang
Daiqi Shi
Yuanjun Gao
Chen Ye
Yang Cai
Jian Gao
Danfeng Yan
VLM
140
0
0
24 Nov 2025
Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems
Jeffrey Wen
Rizwan Ahmad
Philip Schniter
MedIm
333
0
0
17 Nov 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
212
3
0
28 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
88
0
0
17 Oct 2025
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Zhenlong Yuan
Xiangyan Qu
Chengxuan Qian
Rui Chen
Jing Tang
...
Xiangxiang Chu
Dapeng Zhang
Yiwei Wang
Y. Cai
Shuo Li
VLM
LRM
140
8
0
09 Oct 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Yunlong Tang
Jing Bi
Pinxin Liu
Zhenyu Pan
Mingqian Feng
...
Zeliang Zhang
Daiki Shimada
Han Liu
Jiebo Luo
Chenliang Xu
MLLM
OffRL
VLM
LRM
742
8
0
06 Oct 2025
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
Xiangrui Liu
Minghao Qin
Yan Shu
Zhengyang Liang
Yang Tian
Chen Jason Zhang
Bo Zhao
Zheng Liu
319
0
0
30 Sep 2025
TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding
Kimihiro Hasegawa
Wiradee Imrattanatrai
Masaki Asada
Ken Fukuda
Teruko Mitamura
144
0
0
30 Sep 2025
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Shenghao Fu
Q. Yang
Yuan-Ming Li
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
LRM
164
7
0
29 Sep 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRL
AI4TS
LRM
230
2
0
28 Sep 2025
1