ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,395 papers shown
Title
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
88
1
0
26 Feb 2025
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Xinghao Chen
Zhijing Sun
Wenjin Guo
Miaoran Zhang
Yanjun Chen
...
Hui Su
Yijie Pan
Dietrich Klakow
Wenjie Li
Xiaoyu Shen
LRM
51
4
0
25 Feb 2025
DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
Tianyi Zhuang
Chuqiao Kuang
Xiaoguang Li
Yihua Teng
Jihao Wu
Y. Wang
Lifeng Shang
RALM
ELM
LRM
72
0
0
25 Feb 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei
LRM
48
21
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
80
1
0
25 Feb 2025
AgentRM: Enhancing Agent Generalization with Reward Modeling
AgentRM: Enhancing Agent Generalization with Reward Modeling
Yu Xia
Jingru Fan
Weize Chen
Siyu Yan
Xin Cong
Zhong Zhang
Y. Lu
Yankai Lin
Zhiyuan Liu
Maosong Sun
49
1
0
25 Feb 2025
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
Siqi Guo
Ilgee Hong
Vicente Balmaseda
Changlong Yu
Liang Qiu
Xin Liu
Haoming Jiang
Tuo Zhao
Tianbao Yang
43
0
0
25 Feb 2025
Unveiling and Causalizing CoT: A Causal Pespective
Unveiling and Causalizing CoT: A Causal Pespective
Jiarun Fu
LiZhong Ding
Hao Li
P. Li
Qiuning Wei
Xu Chen
LRM
76
0
0
25 Feb 2025
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning
Frederikus Hudi
Genta Indra Winata
Ruochen Zhang
Alham Fikri Aji
ReLM
LRM
80
2
0
25 Feb 2025
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
Shuming Shi
Ruobing Zuo
Gaolei He
Jianlin Wang
Chenyang Xu
Zhengfeng Yang
60
0
0
25 Feb 2025
LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena
LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena
Tianmi Ma
Jiawei Du
Wenxin Huang
Wenjie Wang
Liang Xie
X. Zhong
Joey Tianyi Zhou
62
2
0
25 Feb 2025
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Zhewei Kang
Xuandong Zhao
Dawn Song
LRM
62
2
0
25 Feb 2025
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Alon Albalak
Duy Phung
Nathan Lile
Rafael Rafailov
Kanishk Gandhi
...
Anikait Singh
Chase Blagden
Violet Xiang
Dakota Mahan
Nick Haber
OffRL
LRM
45
4
0
24 Feb 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
94
0
0
24 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
36
4
0
24 Feb 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
57
2
0
24 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
41
0
0
24 Feb 2025
PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data Augmentation
PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data Augmentation
Jing Luo
Longze Chen
Run Luo
Liang Zhu
Chang Ao
...
A. Argha
Hamid Alinejad-Rokny
Chengming Li
Shiwen Ni
Min Yang
SyDa
AIMat
82
0
0
24 Feb 2025
Training a Generally Curious Agent
Training a Generally Curious Agent
Fahim Tajwar
Yiding Jiang
Abitha Thankaraj
Sumaita Sadia Rahman
J. Zico Kolter
Jeff Schneider
Ruslan Salakhutdinov
118
1
0
24 Feb 2025
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning
Raghav Singhal
Kaustubh Ponkshe
Rohit Vartak
Lav R. Varshney
Praneeth Vepakomma
FedML
74
0
0
24 Feb 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
64
3
0
24 Feb 2025
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan
Jane Dwivedi-Yu
Song Jiang
Karthik Padthe
Yang Li
...
Ilia Kulikov
Kyunghyun Cho
Yuandong Tian
Jason Weston
Xian Li
ReLM
LRM
51
10
0
24 Feb 2025
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models
Yibo Zhong
Haoxiang Jiang
Lincan Li
Ryumei Nakada
Tianci Liu
Linjun Zhang
Huaxiu Yao
Haoyu Wang
75
2
0
24 Feb 2025
Large Language Models and Mathematical Reasoning Failures
Large Language Models and Mathematical Reasoning Failures
Johan Boye
Birger Moell
ELM
LRM
45
1
0
24 Feb 2025
DISC: Dynamic Decomposition Improves LLM Inference Scaling
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light
Wei Cheng
Wu Yue
Masafumi Oyamada
Mengdi Wang
Santiago Paternain
Haifeng Chen
ReLM
LRM
56
1
0
23 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
H. Li
Tao Xie
Baishakhi Ray
41
2
0
23 Feb 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding
Wentao Jiang
Shunyu Liu
Yongcheng Jing
J. Guo
...
Zengmao Wang
Z. Liu
Bo Du
X. Liu
Dacheng Tao
LRM
44
4
0
22 Feb 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
51
3
0
22 Feb 2025
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Thanh Le-Cong
Bach Le
Toby Murray
LRM
39
1
0
22 Feb 2025
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
E. Davis
S. Aaronson
ELM
122
21
0
21 Feb 2025
Forecasting Frontier Language Model Agent Capabilities
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
Axel Højmark
Jérémy Scheurer
Marius Hobbhahn
LLMAG
ELM
41
1
0
21 Feb 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Rui Ye
Xiaowen Dong
Y. Wang
Yanfeng Wang
S. Chen
SyDa
LLMAG
127
3
0
21 Feb 2025
CER: Confidence Enhanced Reasoning in LLMs
CER: Confidence Enhanced Reasoning in LLMs
Ali Razghandi
Seyed Mohsen Hosseini
Mahdieh Soleymani Baghshah
LRM
98
2
0
21 Feb 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Z. Chen
Mingxiao Li
Shangsong Liang
Z. Ren
V. Honavar
93
5
0
21 Feb 2025
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman
Michal Golovanesky
Amir Bar
Vedant Palit
Yann LeCun
Carsten Eickhoff
Ritambhara Singh
LRM
47
2
0
21 Feb 2025
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
Yuchen Yan
Jin Jiang
Yang Liu
Yixin Cao
Xin Xu
M. Zhang
Xunliang Cai
Jian Shao
ReLM
LRM
KELM
110
7
0
21 Feb 2025
Improving Value-based Process Verifier via Structural Prior Injection
Improving Value-based Process Verifier via Structural Prior Injection
Zetian Sun
Dongfang Li
Baotian Hu
Jun Yu
Min-Ling Zhang
35
0
0
21 Feb 2025
Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques
Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques
Sangjun Han
Taeil Hur
Youngmi Hur
Kathy Sangkyung Lee
Myungyoon Lee
Hyojae Lim
90
0
0
20 Feb 2025
InductionBench: LLMs Fail in the Simplest Complexity Class
InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua
Tyler Wong
Sun Fei
Liangming Pan
Adam Jardine
William Yang Wang
LRM
70
2
0
20 Feb 2025
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Shuqi Liu
Han Wu
Bowei He
Xiongwei Han
M. Yuan
Linqi Song
MoMe
47
1
0
20 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
B. Wang
Weipeng Chen
VLM
148
1
0
20 Feb 2025
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Daniel J.H. Chung
Zhiqi Gao
Yurii Kvasiuk
Tianyi Li
Moritz Münchmeyer
Maja Rudolph
Frederic Sala
Sai Chaitanya Tadepalli
AIMat
44
3
0
19 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Z. Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Y. Wang
44
0
0
19 Feb 2025
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Shubham Parashar
Blake Olson
Sambhav Khurana
Eric Li
Hongyi Ling
James Caverlee
Shuiwang Ji
LRM
ReLM
87
8
0
18 Feb 2025
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
S2^22R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
57
2
0
18 Feb 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
V. Cevher
49
0
0
18 Feb 2025
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
ELM
58
2
0
18 Feb 2025
Towards Reasoning Ability of Small Language Models
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
49
4
0
17 Feb 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Y. Wang
Yichun Yin
Y. Wang
Lifeng Shang
Q. Liu
LRM
68
2
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
100
14
0
17 Feb 2025
Previous
123...678...262728
Next