Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.05132
Cited By
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
7 March 2025
Hengguang Zhou
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
OffRL
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model"
34 / 34 papers shown
Title
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
32
0
0
30 May 2025
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
Liyun Zhu
Qixiang Chen
Xi Shen
Xiaodong Cun
AI4TS
LRM
33
0
0
29 May 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang
Shiqian Su
Shi-Qi Liu
Xuan Dong
Yue Yu
...
Hao Li
Wenhai Wang
Yu Qiao
Xizhou Zhu
Jifeng Dai
OffRL
35
0
0
29 May 2025
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
Lai Wei
Yuting Li
Kaipeng Zheng
Chen Wang
Yue Wang
Linghe Kong
Lichao Sun
Weiran Huang
OffRL
ReLM
LRM
29
0
0
28 May 2025
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
Zhehan Kan
Y. Liu
Kun Yin
Xinghua Jiang
Xin Li
...
Yinsong Liu
D. Jiang
Xing Sun
Qingmin Liao
Wenming Yang
LRM
13
0
0
27 May 2025
Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning
Ruolin Shen
Xiaozhong Ji
Kai WU
Jiangning Zhang
Yijun He
HaiHua Yang
Xiaobin Hu
Xiaoyu Sun
40
0
0
26 May 2025
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu
Jingcheng Yang
Jize Jiang
Meitang Li
Kaizhuo Yan
Hanchao Yu
Minjia Zhang
Chengxiang Zhai
Klara Nahrstedt
LRM
90
0
0
25 May 2025
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Haoyuan Sun
Jiaqi Wu
Bo Xia
Yifu Luo
Yifei Zhao
Kai Qin
Xufei Lv
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
OffRL
LRM
137
0
0
24 May 2025
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRL
LRM
71
0
0
23 May 2025
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
Xiaoqing Zhang
Huabin Zheng
Ang Lv
Yuhan Liu
Zirui Song
Flood Sung
Xiuying Chen
Rui Yan
OffRL
ReLM
LRM
AI4CE
54
0
0
22 May 2025
Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Xintong Zhang
Zhi Gao
Bofei Zhang
Pengxiang Li
Xiaowen Zhang
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LRM
78
0
0
21 May 2025
Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models
Wenhui Zhu
Xuanzhao Dong
Xin Li
Peijie Qiu
Xiwen Chen
Abolfazl Razi
Aris Sotiras
Yi Su
Yalin Wang
OffRL
LM&MA
54
0
0
20 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
47
2
0
19 May 2025
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&Ro
LRM
95
0
0
16 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Yansen Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
53
2
0
15 May 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing
Xiaowei Hu
Chi-Wing Fu
Wei Wang
Jifeng Dai
Pheng-Ann Heng
MLLM
OffRL
VLM
LRM
76
2
0
07 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OOD
LRM
VLM
77
6
0
06 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
119
3
0
05 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
90
4
0
30 Apr 2025
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
Rui Chen
Lei Sun
Jing Tang
Geng Li
Xiangxiang Chu
LRM
53
0
0
14 Apr 2025
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
LRM
91
8
0
13 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
Xianfeng Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLM
VLM
OffRL
LRM
111
20
0
10 Apr 2025
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models
Sihang Li
Yansen Wang
Ruipeng Wang
Zijun Yao
Kun Wang
An Zhang
Xiang Wang
Tat-Seng Chua
AAML
LRM
78
7
0
09 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Lu Dong
Xiangyu Zeng
Yinan He
Yun Wang
Yu Qiao
Yi Wang
Limin Wang
VLM
AI4TS
LRM
75
18
0
09 Apr 2025
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
X. Chen
Wei Li
Chunxu Liu
Chi Xie
Xiaoyan Hu
Chengqian Ma
Feng Zhu
Rui Zhao
ReLM
LRM
86
2
0
08 Apr 2025
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
Yi-Long Lu
Chunhui Zhang
Jiajun Song
Lifeng Fan
Wei Wang
OffRL
74
0
0
02 Apr 2025
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Zhenyi Liao
Qingsong Xie
Yanhao Zhang
Zijian Kong
Haonan Lu
Zhenyu Yang
Zhijie Deng
ReLM
VLM
LRM
128
5
1
01 Apr 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
124
35
0
27 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
93
5
0
23 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jing Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
114
10
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yize Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
314
3
0
18 Mar 2025
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Gang Li
Jizhong Liu
Heinrich Dinkel
Yadong Niu
Junbo Zhang
Jian Luan
OffRL
LRM
ReLM
93
10
0
14 Mar 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Omkar Thawakar
Dinura Dissanayake
Ketan More
Ritesh Thawkar
Ahmed Heakl
...
Hisham Cholakkal
Ivan Laptev
Mubarak Shah
Fahad Shahbaz Khan
Salman Khan
VLM
LRM
80
46
0
10 Jan 2025
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
245
18,685
0
20 Jul 2017
1