ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.04416
  4. Cited By
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
v1v2 (latest)

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning

6 August 2025
H. Zhang
Xin Gu
Jiawen Li
Chixiang Ma
Sule Bai
Chubin Zhang
Bowen Zhang
Zhichao Zhou
Dongliang He
Yansong Tang
    OffRLLRM
ArXiv (abs)PDFHTML

Papers citing "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

15 / 15 papers shown
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
Pengfei Hu
Meng Cao
Y. Wang
Yi Wang
Jiahua Dong
Jun Song
Yu Cheng
Bo Zheng
Xiaodan Liang
LRMVLM
137
0
0
30 Nov 2025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
H. Rasheed
Mohammed Zumri
Muhammad Maaz
Ming-Hsuan Yang
Fahad Shahbaz Khan
Salman Khan
AI4TSLRM
164
0
0
28 Nov 2025
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
Xin Gu
H. Zhang
Qihang Fan
Jingxuan Niu
Zhipeng Zhang
Libo Zhang
G. Chen
Fan Chen
Longyin Wen
Sijie Zhu
AI4TSLRM
327
1
0
26 Nov 2025
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
Zuhao Yang
Sudong Wang
Kaichen Zhang
Keming Wu
Sicong Leng
...
Bo Li
Chengwei Qin
Shijian Lu
X. Li
Lidong Bing
LRMVLM
178
5
0
25 Nov 2025
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen
Zikang Wang
Zhengrong Yue
Kainan Yan
Chenyun Yu
...
Yafei Wen
Xiaoxin Chen
Yang Liu
Peng Li
Yali Wang
LLMAG
324
3
0
24 Nov 2025
VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
Fufangchen Zhao
Liao Zhang
Daiqi Shi
Yuanjun Gao
Chen Ye
Yang Cai
Jian Gao
Danfeng Yan
VLM
140
0
0
24 Nov 2025
Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems
Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems
Jeffrey Wen
Rizwan Ahmad
Philip Schniter
MedIm
333
0
0
17 Nov 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
212
3
0
28 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
88
0
0
17 Oct 2025
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Zhenlong Yuan
Xiangyan Qu
Chengxuan Qian
Rui Chen
Jing Tang
...
Xiangxiang Chu
Dapeng Zhang
Yiwei Wang
Y. Cai
Shuo Li
VLMLRM
140
8
0
09 Oct 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Yunlong Tang
Jing Bi
Pinxin Liu
Zhenyu Pan
Mingqian Feng
...
Zeliang Zhang
Daiki Shimada
Han Liu
Jiebo Luo
Chenliang Xu
MLLMOffRLVLMLRM
742
8
0
06 Oct 2025
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
Xiangrui Liu
Minghao Qin
Yan Shu
Zhengyang Liang
Yang Tian
Chen Jason Zhang
Bo Zhao
Zheng Liu
319
0
0
30 Sep 2025
TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding
TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding
Kimihiro Hasegawa
Wiradee Imrattanatrai
Masaki Asada
Ken Fukuda
Teruko Mitamura
144
0
0
30 Sep 2025
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Shenghao Fu
Q. Yang
Yuan-Ming Li
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
LRM
164
7
0
29 Sep 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRLAI4TSLRM
230
2
0
28 Sep 2025
1