ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.17256
  4. Cited By
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

23 December 2024
Weihao Zeng
Yuzhen Huang
Lulu Zhao
Yijun Wang
Zifei Shan
Junxian He
    LRM
ArXivPDFHTML

Papers citing "B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners"

5 / 5 papers shown
Title
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
Tianqing Fang
H. M. Zhang
Z. Zhang
Kaixin Ma
W. Yu
Haitao Mi
Dong Yu
LLMAG
KELM
159
0
0
23 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
140
0
0
17 Apr 2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
FangZhi Xu
Hang Yan
Chang Ma
Haiteng Zhao
Qiushi Sun
Kanzhi Cheng
Junxian He
Jun Liu
Zhiyong Wu
LRM
29
1
0
11 Apr 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
91
38
0
24 Mar 2025
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li
Yinyi Luo
Anudeep Bolimera
Uzair Ahmed
Shri Kiran Srinivasan
Hrishikesh Gokhale
Marios Savvides
LRM
AI4CE
60
1
0
06 Mar 2025
1