ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,395 papers shown
Title
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
109
2
0
07 Mar 2025
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Z. Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
...
Xu Miao
Y. Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
81
15
0
06 Mar 2025
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
Zichong Li
Xinyu Feng
Yuheng Cai
Zixuan Zhang
Tianyi Liu
Chen Liang
Weizhu Chen
Haoyu Wang
T. Zhao
LRM
50
1
0
06 Mar 2025
How to Mitigate Overfitting in Weak-to-strong Generalization?
Junhao Shi
Qinyuan Cheng
Zhaoye Fei
Y. Zheng
Qipeng Guo
Xipeng Qiu
65
0
0
06 Mar 2025
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen
J. Zhang
Jieyun Huang
Shuming Shi
Wenjing Zhang
Jiangze Yan
Ning Wang
Kai Wang
Shiguo Lian
LRM
75
12
0
06 Mar 2025
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li
Yinyi Luo
Anudeep Bolimera
Uzair Ahmed
S.
Hrishikesh Gokhale
Marios Savvides
LRM
AI4CE
60
1
0
06 Mar 2025
Factorio Learning Environment
Jack Hopkins
Mart Bakler
Akbir Khan
LRM
AI4CE
LLMAG
50
0
0
06 Mar 2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma
Niyar R. Barman
Nicholas M. Asher
Akshay Chaturvedi
LRM
AIMat
67
0
0
06 Mar 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
E. Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
S.
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
70
0
0
05 Mar 2025
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
Rui Ye
Shuo Tang
Rui Ge
Yaxin Du
Zhenfei Yin
S. Chen
Jing Shao
LLMAG
87
1
0
05 Mar 2025
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Jiarui Yao
Ruida Wang
Tong Zhang
LRM
55
0
0
05 Mar 2025
Extrapolation Merging: Keep Improving With Extrapolation and Merging
Yiguan Lin
Bin Xu
Yinghao Li
Yang Gao
MoMe
57
1
0
05 Mar 2025
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Ruida Wang
Rui Pan
Yuxin Li
Jipeng Zhang
Yizhen Jia
Shizhe Diao
Renjie Pi
Junjie Hu
Tong Zhang
LRM
LLMAG
78
5
0
05 Mar 2025
Process-based Self-Rewarding Language Models
Shimao Zhang
Xiao Liu
Xin Zhang
Junxiao Liu
Zheheng Luo
Shujian Huang
Yeyun Gong
ReLM
SyDa
LRM
93
2
0
05 Mar 2025
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems
Mahfuz Ahmed Anik
Abdur Rahman
Azmine Toushik Wasi
Md Manjurul Ahsan
47
0
0
05 Mar 2025
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
Wei Sun
Qianlong Du
Fuwei Cui
Jiajun Zhang
OffRL
LRM
31
0
0
04 Mar 2025
Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent
Xingzuo Li
Kehai Chen
Yunfei Long
X. Bai
Yong-mei Xu
Min Zhang
LRM
LLMAG
79
1
0
04 Mar 2025
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models
Joykirat Singh
Tanmoy Chakraborty
A. Nambi
AI4Cl
LRM
ReLM
55
1
0
04 Mar 2025
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu
Yongcheng Jing
Yingjie Wang
Wenbin Hu
Dacheng Tao
RALM
LRM
64
2
0
03 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Y. Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Z. Zhang
Fei Huang
Rui Wang
BDL
LRM
62
6
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
21
0
03 Mar 2025
Toward Stable and Consistent Evaluation Results: A New Methodology for Base Model Evaluation
Hongzhi Luan
Changxin Tian
Zhaoxin Huan
Xiaolu Zhang
Kunlong Chen
Zhiqiang Zhang
Jun Zhou
40
1
0
02 Mar 2025
Evaluating Polish linguistic and cultural competency in large language models
Sławomir Dadas
Małgorzata Grębowiec
Michał Perełkiewicz
Rafał Poświata
ELM
39
1
0
02 Mar 2025
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
Miao Peng
Nuo Chen
Zongrui Suo
Jia Li
LRM
31
0
0
02 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
53
8
0
01 Mar 2025
ProBench: Benchmarking Large Language Models in Competitive Programming
ProBench: Benchmarking Large Language Models in Competitive Programming
Lei Yang
Renren Jin
Ling Shi
Jianxiang Peng
Yue Chen
Deyi Xiong
ReLM
ELM
LRM
53
2
0
28 Feb 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
C. L. P. Chen
Jintao Zhang
J. Zhu
Jianfei Chen
MQ
39
2
0
28 Feb 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
Grigor Nalbandyan
Rima Shahbazyan
Evelina Bakhturina
ELM
33
0
0
28 Feb 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
LRM
42
3
0
28 Feb 2025
Digital Player: Evaluating Large Language Models based Human-like Agent in Games
Digital Player: Evaluating Large Language Models based Human-like Agent in Games
J. T. Wang
Kai Wang
Shaojie Lin
Runze Wu
Bihan Xu
...
Zhipeng Hu
Z. Fan
Le Li
Tangjie Lyu
Changjie Fan
LLMAG
ELM
AI4CE
53
1
0
28 Feb 2025
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Kechen Li
Wenqi Zhu
Coralia Cartis
Tianbo Ji
Shiwei Liu
ReLM
LRM
44
0
0
27 Feb 2025
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
Guizhen Chen
Weiwen Xu
Hao Zhang
Hou Pong Chan
Chaoqun Liu
Lidong Bing
Deli Zhao
Anh Tuan Luu
Yu Rong
ReLM
LRM
51
3
0
27 Feb 2025
LangProBe: a Language Programs Benchmark
LangProBe: a Language Programs Benchmark
Shangyin Tan
Lakshya A Agrawal
Arnav Singhvi
Liheng Lai
Michael J Ryan
Dan Klein
Omar Khattab
Koushik Sen
Matei A. Zaharia
64
0
0
27 Feb 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
44
5
0
27 Feb 2025
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta
Junxiong Wang
Matteo Pagliardini
Kevin Y. Li
Aviv Bick
J. Zico Kolter
Albert Gu
F. Fleuret
Tri Dao
ReLM
LRM
43
7
0
27 Feb 2025
Self-Training Elicits Concise Reasoning in Large Language Models
Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat
Namgyu Ho
S. Kim
Yongjin Yang
Yujin Kim
Se-Young Yun
ReLM
LRM
54
10
0
27 Feb 2025
Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation
Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation
Yiwei Li
Ji Zhang
Shaoxiong Feng
Peiwen Yuan
X. Wang
...
Y. Zhang
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
HILM
42
1
0
27 Feb 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Franck Cappello
Sandeep Madireddy
Robert Underwood
N. Getty
Nicholas Chia
...
M. Rafique
Eliu A. Huerta
B. Li
Ian Foster
Rick L. Stevens
72
1
0
27 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
81
3
0
26 Feb 2025
MathClean: A Benchmark for Synthetic Mathematical Data Cleaning
MathClean: A Benchmark for Synthetic Mathematical Data Cleaning
Hao Liang
Meiyi Qiang
Y. Li
Zefeng He
Yongzhen Guo
Z. Zhu
Wentao Zhang
Bin Cui
33
0
0
26 Feb 2025
CritiQ: Mining Data Quality Criteria from Human Preferences
CritiQ: Mining Data Quality Criteria from Human Preferences
Honglin Guo
Kai Lv
Qipeng Guo
Tianyi Liang
Zhiheng Xi
...
Qiuyinzhe Zhang
Y. Sun
K. Chen
Xipeng Qiu
Tao Gui
33
0
0
26 Feb 2025
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Max W.F. Ku
Thomas Chong
Jonathan Leung
Krish Shah
Alvin Yu
Wenhu Chen
LRM
88
3
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
Learning to Generate Structured Output with Schema Reinforcement Learning
Learning to Generate Structured Output with Schema Reinforcement Learning
Y. Lu
Haolun Li
Xin Cong
Zhong Zhang
Yesai Wu
Yankai Lin
Zhiyuan Liu
Fangming Liu
Maosong Sun
39
1
0
26 Feb 2025
Multi-LLM Collaborative Search for Complex Problem Solving
Multi-LLM Collaborative Search for Complex Problem Solving
Sen Yang
Yafu Li
Wai Lam
Yu Cheng
LLMAG
LRM
68
1
0
26 Feb 2025
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval
Jiarong Wu
Songqiang Chen
Jialun Cao
Hau Ching Lo
S. Cheung
51
0
0
26 Feb 2025
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
Humza Sami
Mubashir ul Islam
Samy Charas
Asav Gandhi
P. Gaillardon
V. Tenace
LLMAG
74
0
0
26 Feb 2025
Self-rewarding correction for mathematical reasoning
Self-rewarding correction for mathematical reasoning
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
67
9
0
26 Feb 2025
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang
Binhang Qi
Hailong Sun
Wenrui Long
Ruobing Zhao
Xiang Gao
MoMe
48
0
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
88
1
0
26 Feb 2025
Previous
123...567...262728
Next