Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2508.00222
Cited By
v1
v2
v3
v4 (latest)
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
31 July 2025
Yihong Dong
Xue Jiang
Yongding Tao
Huanyu Liu
Kechi Zhang
Lili Mou
Rongyu Cao
Yingwei Ma
Jue Chen
Binhua Li
Zhi Jin
Fei Huang
Y. Li
Ge Li
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (6 upvotes)
Github (14★)
Papers citing
"RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization"
14 / 14 papers shown
Title
BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation
Haoyuan Li
Zhengyuan Shen
Sullam Jeoung
Yueyan Chen
Jiayu Li
Qi Zhu
Shuai Wang
V. Ioannidis
Huzefa Rangwala
134
0
0
23 Oct 2025
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Yihong Dong
Zhaoyu Ma
Xue Jiang
Zhiyuan Fan
Jiaru Qian
...
Rongyu Cao
B. Li
Fei Huang
Yongbin Li
Ge Li
104
2
0
20 Oct 2025
SimKO: Simple Pass@K Policy Optimization
Ruotian Peng
Yi Ren
Zhouliang Yu
Weiyang Liu
Yandong Wen
204
2
0
16 Oct 2025
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
Can Xie
Ruotong Pan
Xiangyu Wu
Y. Zhang
Jiayi Fu
Tingting Gao
G. Zhou
OffRL
LRM
116
0
0
12 Oct 2025
Kaputt: A Large-Scale Dataset for Visual Defect Detection
Sebastian Höfer
Dorian Henning
Artemij Amiranashvili
D. Morrison
Mariliza Tzes
Ingmar Posner
Marc Matvienko
Alessandro Rennola
Anton Milan
113
0
0
07 Oct 2025
MCCE: A Framework for Multi-LLM Collaborative Co-Evolution
Nian Ran
Zhongzheng Li
Yue Wang
Qingsong Ran
Xiaoyuan Zhang
Shikun Feng
Richard Allmendinger
Xiaoguang Zhao
80
0
0
06 Oct 2025
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
Xinhao Yao
Lu Yu
Xiaolin Hu
Fengwei Teng
Qing Cui
Jun Zhou
Yong Liu
LRM
133
0
0
05 Oct 2025
How LLMs Learn to Reason: A Complex Network Perspective
Sihan Hu
X-D Cai
Yuan Huang
Zhiyuan Yao
Linfeng Zhang
Pan Zhang
Youjin Deng
Kun Chen
LRM
145
1
0
28 Sep 2025
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Xu Wujiang
Wentian Zhao
Zhenting Wang
Li Yu-Jhe
Jin Can
Jin Mingyu
Mei Kai
Wan Kun
Metaxas Dimitris
80
0
0
26 Sep 2025
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
Long Li
Jiaran Hao
Jason Klein Liu
Zhijian Zhou
Yanting Miao
...
Wei Chu
Zhe Wang
Shirui Pan
Chao Qu
Yuan Qi
131
5
0
09 Sep 2025
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Yang Zhou
Sunzhu Li
Shunyu Liu
Wenkai Fang
Jiale Zhao
...
Hengtong Lu
Wei Chen
Yan Xie
Mingli Song
Weilong Dai
LRM
212
7
0
23 Aug 2025
Intern-S1: A Scientific Multimodal Foundation Model
Wenlong Zhang
Zhongrui Cai
Maosong Cao
Weihan Cao
C. Chen
...
Wenchang Ning
Xinle Pang
Jiahui Peng
Runyu Peng
Yu Qiao
MoE
LRM
89
29
0
21 Aug 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang
Yuexiang Xie
Yuchang Sun
Yanxi Chen
Guoyin Wang
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
159
27
0
15 Aug 2025
FAN: Fourier Analysis Networks
Yihong Dong
Ge Li
Yongding Tao
Xue Jiang
Kechi Zhang
Jia Li
Jinliang Deng
Jing Su
Jun Zhang
Jingjing Xu
AI4TS
368
19
0
03 Oct 2024
1