ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.00222
  4. Cited By
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
v1v2v3v4 (latest)

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

31 July 2025
Yihong Dong
Xue Jiang
Yongding Tao
Huanyu Liu
Kechi Zhang
Lili Mou
Rongyu Cao
Yingwei Ma
Jue Chen
Binhua Li
Zhi Jin
Fei Huang
Y. Li
Ge Li
    LRM
ArXiv (abs)PDFHTMLHuggingFace (6 upvotes)Github (14★)

Papers citing "RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization"

14 / 14 papers shown
Title
BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation
BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation
Haoyuan Li
Zhengyuan Shen
Sullam Jeoung
Yueyan Chen
Jiayu Li
Qi Zhu
Shuai Wang
V. Ioannidis
Huzefa Rangwala
134
0
0
23 Oct 2025
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Yihong Dong
Zhaoyu Ma
Xue Jiang
Zhiyuan Fan
Jiaru Qian
...
Rongyu Cao
B. Li
Fei Huang
Yongbin Li
Ge Li
104
2
0
20 Oct 2025
SimKO: Simple Pass@K Policy Optimization
SimKO: Simple Pass@K Policy Optimization
Ruotian Peng
Yi Ren
Zhouliang Yu
Weiyang Liu
Yandong Wen
204
2
0
16 Oct 2025
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
Can Xie
Ruotong Pan
Xiangyu Wu
Y. Zhang
Jiayi Fu
Tingting Gao
G. Zhou
OffRLLRM
116
0
0
12 Oct 2025
Kaputt: A Large-Scale Dataset for Visual Defect Detection
Kaputt: A Large-Scale Dataset for Visual Defect Detection
Sebastian Höfer
Dorian Henning
Artemij Amiranashvili
D. Morrison
Mariliza Tzes
Ingmar Posner
Marc Matvienko
Alessandro Rennola
Anton Milan
113
0
0
07 Oct 2025
MCCE: A Framework for Multi-LLM Collaborative Co-Evolution
MCCE: A Framework for Multi-LLM Collaborative Co-Evolution
Nian Ran
Zhongzheng Li
Yue Wang
Qingsong Ran
Xiaoyuan Zhang
Shikun Feng
Richard Allmendinger
Xiaoguang Zhao
80
0
0
06 Oct 2025
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
Xinhao Yao
Lu Yu
Xiaolin Hu
Fengwei Teng
Qing Cui
Jun Zhou
Yong Liu
LRM
133
0
0
05 Oct 2025
How LLMs Learn to Reason: A Complex Network Perspective
How LLMs Learn to Reason: A Complex Network Perspective
Sihan Hu
X-D Cai
Yuan Huang
Zhiyuan Yao
Linfeng Zhang
Pan Zhang
Youjin Deng
Kun Chen
LRM
145
1
0
28 Sep 2025
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Xu Wujiang
Wentian Zhao
Zhenting Wang
Li Yu-Jhe
Jin Can
Jin Mingyu
Mei Kai
Wan Kun
Metaxas Dimitris
80
0
0
26 Sep 2025
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
Long Li
Jiaran Hao
Jason Klein Liu
Zhijian Zhou
Yanting Miao
...
Wei Chu
Zhe Wang
Shirui Pan
Chao Qu
Yuan Qi
131
5
0
09 Sep 2025
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Yang Zhou
Sunzhu Li
Shunyu Liu
Wenkai Fang
Jiale Zhao
...
Hengtong Lu
Wei Chen
Yan Xie
Mingli Song
Weilong Dai
LRM
212
7
0
23 Aug 2025
Intern-S1: A Scientific Multimodal Foundation Model
Intern-S1: A Scientific Multimodal Foundation Model
Wenlong Zhang
Zhongrui Cai
Maosong Cao
Weihan Cao
C. Chen
...
Wenchang Ning
Xinle Pang
Jiahui Peng
Runyu Peng
Yu Qiao
MoELRM
89
29
0
21 Aug 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang
Yuexiang Xie
Yuchang Sun
Yanxi Chen
Guoyin Wang
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
159
27
0
15 Aug 2025
FAN: Fourier Analysis Networks
FAN: Fourier Analysis Networks
Yihong Dong
Ge Li
Yongding Tao
Xue Jiang
Kechi Zhang
Jia Li
Jinliang Deng
Jing Su
Jun Zhang
Jingjing Xu
AI4TS
368
19
0
03 Oct 2024
1