ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.17387
  4. Cited By
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

24 February 2025
Alon Albalak
Duy Phung
Nathan Lile
Rafael Rafailov
Kanishk Gandhi
Louis Castricato
Anikait Singh
Chase Blagden
Robert Z. Sparks
Dakota Mahan
Nick Haber
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (6 upvotes)

Papers citing "Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models"

50 / 54 papers shown
Title
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
Md Tanvirul Alam
Saksham Aggarwal
Justin Yang Chae
Nidhi Rastogi
ReLMLRM
272
0
0
25 Nov 2025
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
Xuan Gong
Senmiao Wang
Hanbo Huang
Ruoyu Sun
Shiyu Liang
OffRLLRM
94
0
0
31 Oct 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
140
0
0
30 Oct 2025
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Guoxin Chen
Jing Wu
Xinjie Chen
Wayne Xin Zhao
Ruihua Song
Chengxi Li
Kai Fan
Dayiheng Liu
Minpeng Liao
AIMatOffRL
287
0
0
28 Oct 2025
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
Da Zhang
Chenggang Rong
Bingyu Li
Feiyu Wang
Zhiyuan Zhao
Junyu Gao
Xuelong Li
VLMCoGe
188
0
0
21 Oct 2025
NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP Problems
NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP Problems
Xiaozhe Li
Xinyu Fang
Shengyuan Ding
Linyang Li
Haodong Duan
Qingwen Liu
Kai Chen
OffRLLRM
76
1
0
18 Oct 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang
Y. Ge
S. Yang
Yicheng Xiao
Huizi Mao
...
Hongxu Yin
Yao Lu
Xiaojuan Qi
Song Han
Yukang Chen
OffRL
90
0
0
13 Oct 2025
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Wenhan Ma
Hailin Zhang
Liang Zhao
Yifan Song
Yudong Wang
Zhifang Sui
Fuli Luo
MoE
187
3
0
13 Oct 2025
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Prasanna Mayilvahanan
Ricardo Dominguez-Olmedo
Thaddäus Wiedemer
Wieland Brendel
OffRLAIMatReLMLRM
177
0
0
13 Oct 2025
Diagnosing and Mitigating System Bias in Self-Rewarding RL
Diagnosing and Mitigating System Bias in Self-Rewarding RL
Chuyi Tan
Peiwen Yuan
Xinglin Wang
Yiwei Li
Shaoxiong Feng
...
Jiayi Shi
Ji Zhang
Boyuan Pan
Yao Hu
Kan Li
88
0
0
10 Oct 2025
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
Shuang Chen
Yue Guo
Yimeng Ye
Shijue Huang
Wenbo Hu
Haoxi Li
Manyuan Zhang
Jiayu Chen
Song Guo
Nanyun Peng
LRM
124
2
0
09 Oct 2025
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Leitian Tao
I. Kulikov
Swarnadeep Saha
Tianlu Wang
Jing Xu
Yixuan Li
Jason Weston
Ping Yu
OffRLLRM
200
2
0
08 Oct 2025
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
Aochong Oliver Li
Tanya Goyal
LRM
84
0
0
07 Oct 2025
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
Honglin Lin
Qizhi Pei
Xin Gao
Zhuoshi Pan
Yu Li
J. Li
Conghui He
Lijun Wu
ALMLRM
152
1
0
05 Oct 2025
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Xinpeng Wang
Nitish Joshi
Barbara Plank
Rico Angell
He He
LRM
187
0
1
01 Oct 2025
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Renjie Luo
Zichen Liu
Xiangyan Liu
Chao Du
Min Lin
Wenhu Chen
Wei Lu
Tianyu Pang
OffRL
112
2
0
26 Sep 2025
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
S. Yu
Yuxin Chen
Hao Ju
Lianjie Jia
Fuxi Zhang
...
Lin Song
Lijun Wang
Yanwei Li
Y. Shan
Huchuan Lu
LRM
285
8
0
23 Sep 2025
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving
Xiyuan Zhou
Xinlei Wang
Yirui He
Yang Wu
Ruixi Zou
...
Wenxuan Liu
Huan Zhao
Yan Xu
Jinjin Gu
Junhua Zhao
ELMLRM
84
1
0
22 Sep 2025
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Shuaijie She
Yu Bao
Y. Lu
Lu Xu
Tao Li
Wenhao Zhu
Shujian Huang
Shanbo Cheng
Lu Lu
Yuping Wang
140
1
0
20 Aug 2025
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
L. Chen
J. Gu
Daigang Xu
Wenhao Huang
Z. L. Jiang
...
Ge Zhang
Tianyun Zhao
Jianqiu Zhao
Yichi Zhou
Thomas Hanwen Zhu
AIMatLRM
206
32
0
31 Jul 2025
BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning
BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning
Jinan Zhou
Rajat Ghosh
Vaishnavi Bhargava
Debojyoti Dutta
Aryan Singhal
152
0
0
31 Jul 2025
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Xingxuan Li
Yao Xiao
Dianwen Ng
Hai Ye
Y. Deng
...
Shihao Xu
Han Zhao
Weiling Chen
Feng Ji
Lidong Bing
OffRLReLMLRMVLM
175
5
0
19 Jul 2025
VAR-MATH: Probing True Mathematical Reasoning in LLMS via Symbolic Multi-Instance Benchmarks
VAR-MATH: Probing True Mathematical Reasoning in LLMS via Symbolic Multi-Instance Benchmarks
Jian Yao
Ran Cheng
Kay Chen Tan
OffRLLRM
100
2
0
17 Jul 2025
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ling Team
Bin Hu
Cai Chen
Deng Zhao
Ding Liu
...
Zhenglei Zhou
Zhenyu Huang
Zhiqiang Zhang
Zihao Wang
Zujie Wen
OffRLMoEALMLRM
201
6
0
17 Jun 2025
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao
Ran Cheng
Xingyu Wu
Jibin Wu
Kay Chen Tan
LRM
227
16
0
29 May 2025
Can Large Reasoning Models Self-Train?
Can Large Reasoning Models Self-Train?
Sheikh Shafayat
Fahim Tajwar
Ruslan Salakhutdinov
J. Schneider
Andrea Zanette
ReLMOffRLLRM
345
19
0
27 May 2025
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen
Qianyu He
Siyu Yuan
Aili Chen
Zhicheng Cai
...
Qiying Yu
Xuefeng Li
Jiaze Chen
Hao Zhou
Mingxuan Wang
ReLMLRM
305
21
0
26 May 2025
Generalizable Process Reward Models via Formally Verified Training Data
Generalizable Process Reward Models via Formally Verified Training Data
Ryo Kamoi
Yusen Zhang
Nan Zhang
Sarkar Snigdha Sarathi Das
Rui Zhang
OffRLLRM
232
2
0
21 May 2025
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
David Dinucu-Jianu
Jakub Macina
Nico Daheim
Ido Hakimi
Iryna Gurevych
Mrinmaya Sachan
KELMLRM
301
2
0
21 May 2025
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Jingyue Gao
Runji Lin
Keming Lu
Bowen Yu
Junyang Lin
Jianyu Chen
LRM
239
1
0
18 May 2025
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Yunjie Ji
Han Zhao
Xiangang Li
OffRLLRM
186
0
0
04 May 2025
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Yunjie Ji
Han Zhao
Xiangang Li
LRM
340
12
0
24 Apr 2025
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan
Jane Dwivedi-Yu
Song Jiang
Karthik Padthe
Yang Li
...
Ilia Kulikov
Dong Wang
Yuandong Tian
Jason Weston
Xian Li
ReLMLRM
458
41
0
18 Feb 2025
Examining False Positives under Inference Scaling for Mathematical Reasoning
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
Fuli Feng
LRM
358
8
0
10 Feb 2025
Demystifying Long Chain-of-Thought Reasoning in LLMs
Demystifying Long Chain-of-Thought Reasoning in LLMs
Edward Yeo
Yuxuan Tong
Morry Niu
Graham Neubig
Xiang Yue
OffRLLRM
442
248
0
05 Feb 2025
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou
Xin Lv
Rui Lu
Jing Zhang
Yongqian Li
Zijun Yao
Juanzi Li
J. Tang
Yuxiao Dong
OffRLLRMReLM
307
58
0
20 Jan 2025
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Robert Z. Sparks
Charlie Snell
Kanishk Gandhi
Alon Albalak
Anikait Singh
...
Dakota Mahan
Louis Castricato
Jan-Philipp Fränken
Nick Haber
Chelsea Finn
LRM
322
78
0
08 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Guang Dai
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRMSyDaReLM
308
236
0
08 Jan 2025
HARP: A challenging human-annotated math reasoning benchmark
HARP: A challenging human-annotated math reasoning benchmark
Albert S. Yue
Lovish Madaan
Ted Moskovitz
DJ Strouse
Aaditya K. Singh
AIMatLRM
144
15
0
11 Dec 2024
Free Process Rewards without Process Labels
Free Process Rewards without Process Labels
Lifan Yuan
Wendi Li
Huayu Chen
Ganqu Cui
Ning Ding
Kaiyan Zhang
Bowen Zhou
Ziqiang Liu
Yuan Yao
OffRL
255
103
0
02 Dec 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMatLRM
441
390
0
07 Oct 2024
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Arindam Mitra
Hamed Khanpour
Corby Rosset
Ahmed Hassan Awadallah
ALMMoELRM
212
108
0
16 Feb 2024
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and
  Local Refinements
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla
Sharath Raparthy
Christoforus Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Roberta Railneau
ReLMLRM
221
94
0
13 Feb 2024
Efficient Online Data Mixing For Language Model Pre-Training
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
Wenjie Wang
297
64
0
05 Dec 2023
Let's Verify Step by Step
Let's Verify Step by StepInternational Conference on Learning Representations (ICLR), 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
842
2,103
0
31 May 2023
Are Emergent Abilities of Large Language Models a Mirage?
Are Emergent Abilities of Large Language Models a Mirage?Neural Information Processing Systems (NeurIPS), 2023
Rylan Schaeffer
Alycia Lee
Oluwasanmi Koyejo
LRM
397
557
0
28 Apr 2023
SemDeDup: Data-efficient learning at web-scale through semantic
  deduplication
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Amro Abbas
Kushal Tirumala
Daniel Simig
Surya Ganguli
Ari S. Morcos
222
230
0
16 Mar 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.2K
14,067
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
1.0K
6,589
0
27 Oct 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
795
3,762
0
05 Mar 2021
12
Next