Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,395 papers shown
Title
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Xiaoyang Chen
Xinan Dai
Yu Du
Qian Feng
Naixu Guo
...
J. Xu
Yiyang Yu
Z. Yang
Hongji Zha
Ruichong Zhang
LRM
24
0
0
13 May 2025
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
Nicholas Attolino
Alessio Capitanelli
Fulvio Mastrogiovanni
17
0
0
13 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
18
0
0
12 May 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
Junjie Ye
Caishuang Huang
Z. Chen
Wenjie Fu
Chenyuan Yang
...
Tao Gui
Qi Zhang
Zhongchao Shi
Jianping Fan
Xuanjing Huang
ALM
21
0
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
16
0
0
12 May 2025
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
Xiaotian Lin
Yanlin Qi
Yizhang Zhu
Themis Palpanas
Chengliang Chai
Nan Tang
Yuyu Luo
16
0
0
12 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
B. S.
Cici
Dawei Zhu
...
Y. Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
27
0
0
12 May 2025
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection
Pei-Fu Guo
Yun-Da Tsai
Shou-De Lin
UD
36
0
0
12 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
18
0
0
12 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
48
0
0
10 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Hao Wu
H. Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
23
0
0
09 May 2025
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Zhun Wang
Vincent Siu
Zhe Ye
Tianneng Shi
Yuzhou Nie
Xuandong Zhao
Chenguang Wang
Wenbo Guo
Dawn Song
LLMAG
AAML
33
0
0
09 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
21
3
0
09 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
25
0
0
08 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
47
0
0
08 May 2025
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao
Yongheng Deng
Jiali Zeng
Dong Wang
Lai Wei
Fandong Meng
Jie Zhou
Ju Ren
Yaoxue Zhang
LRM
47
0
0
08 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
11
0
0
08 May 2025
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Y. Wang
Z. Fu
Jie Cai
Peijun Tang
Hongya Lyu
...
Jie Zhou
Guoyang Zeng
Chaojun Xiao
Xu Han
Zhiyuan Liu
47
0
0
08 May 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Qi Liu
Xinhao Zheng
Renqiu Xia
Xingzhi Qi
Qinxiang Cao
Junchi Yan
AIMat
45
0
0
07 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
S. Song
Ce Zhang
James Y. Zou
ALM
27
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
67
1
0
05 May 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao
Yi Shen
Shuming Shi
Jieyun Huang
Z. Chen
Ning Wang
Siqi Xiao
J. Zhang
Kai Wang
Shiguo Lian
MQ
39
0
0
05 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
50
0
0
05 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
43
0
0
05 May 2025
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
45
0
0
05 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
53
0
0
04 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
57
0
0
03 May 2025
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Madhav Kotecha
Vijendra Kumar Vaishya
Smita Gautam
Suraj Racha
27
0
0
02 May 2025
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
Daniel Weitekamp
M. N. Siddiqui
Christopher James Maclellan
LLMAG
ELM
23
0
0
02 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
44
0
0
02 May 2025
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai
Zhiqiang Xie
Kayvon Fatahalian
LLMAG
68
0
0
01 May 2025
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
Daria Gitman
Igor Gitman
Evelina Bakhturina
SyDa
44
0
0
01 May 2025
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALM
LRM
30
0
0
01 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
45
0
0
01 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
57
0
0
01 May 2025
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jinyan Su
Jennifer Healey
Preslav Nakov
Claire Cardie
LRM
76
0
0
30 Apr 2025
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models
Yinghui He
A. Panigrahi
Yong Lin
Sanjeev Arora
36
0
0
30 Apr 2025
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges
Xiao Xiao
Yu Su
Sijing Zhang
Zhang Chen
Yadong Chen
Tian Liu
32
0
0
30 Apr 2025
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
H. Luo
Haiying He
Y. Wang
Jinluan Yang
Rui Liu
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
Li Shen
LRM
26
0
0
30 Apr 2025
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library
J. Wang
Jinhao Jiang
Zhiqiang Zhang
Jun Zhou
Wayne Xin Zhao
SyDa
53
0
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
110
2
0
29 Apr 2025
Turing Machine Evaluation for Large Language Model
Haitao Wu
Zongbo Han
Huaxi Huang
Changqing Zhang
ELM
LRM
59
0
0
29 Apr 2025
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Adam Younsi
Abdalgader Abubaker
M. Seddik
Hakim Hacid
Salem Lahlou
LRM
54
0
0
28 Apr 2025
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura
Taro Yano
Masafumi Enomoto
M. Oyamada
39
0
0
28 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
61
0
0
28 Apr 2025
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
J. Zhang
Flood Sung
Z. Yang
Yang Gao
Chongjie Zhang
LLMAG
38
0
0
28 Apr 2025
Security Steerability is All You Need
Itay Hazan
Idan Habler
Ron Bitton
Itsik Mantin
AAML
78
0
0
28 Apr 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
J. Li
Lijun Wu
M. Zhang
LLMAG
LRM
64
1
0
27 Apr 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
Y. Li
Qizhi Pei
Mengyuan Sun
Honglin Lin
Chenlin Ming
Xin Gao
Jiang Wu
C. He
Lijun Wu
ELM
LRM
40
0
0
27 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
X. Li
Kwan-Yee Kenneth Wong
LLMAG
ReLM
LRM
82
0
0
27 Apr 2025
1
2
3
4
...
26
27
28
Next