Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,395 papers shown
Title
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
38
0
0
31 Mar 2025
Entropy-Based Adaptive Weighting for Self-Training
Xiaoxuan Wang
Yihe Deng
Mingyu Derek Ma
Wei Wang
LRM
45
0
0
31 Mar 2025
Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
Jianhua Sun
Jiude Wei
Y. Li
Cewu Lu
LM&Ro
54
1
0
30 Mar 2025
Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces
Max Hort
Leon Moonen
39
0
0
30 Mar 2025
ToRL: Scaling Tool-Integrated RL
Xuefeng Li
Haoyang Zou
Pengfei Liu
OffRL
LRM
39
3
0
30 Mar 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Y. Liu
Jian Kang
Jieyu Zhao
KELM
54
0
0
30 Mar 2025
Efficient Inference for Large Reasoning Models: A Survey
Y. Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
LLMAG
LRM
62
7
0
29 Mar 2025
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
Anastasiia Fadeeva
Vincent Coriou
Diego Antognini
C. Musat
Andrii Maksai
45
0
0
29 Mar 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Y. Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
34
0
0
28 Mar 2025
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Belinda Z. Li
Been Kim
Z. Wang
LRM
38
2
0
28 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
80
0
0
28 Mar 2025
SWI: Speaking with Intent in Large Language Models
Yuwei Yin
EunJeong Hwang
Giuseppe Carenini
LRM
44
0
0
27 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
48
0
0
27 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
51
0
0
27 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
60
0
0
27 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
W. Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Y. Zhuang
LM&Ro
LRM
65
2
0
27 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
80
12
0
27 Mar 2025
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
Haoxiang Sun
Yingqian Min
Z. Chen
Wayne Xin Zhao
Zheng Liu
Z. Wang
Lei Fang
Ji-Rong Wen
ELM
LRM
47
2
0
27 Mar 2025
Entropy-Aware Branching for Improved Mathematical Reasoning
Xianzhi Li
Ethan Callanan
Xiaodan Zhu
Mathieu Sibue
Antony Papadimitriou
Mahmoud Mahfouz
Zhiqiang Ma
Xiaomo Liu
LRM
37
0
0
27 Mar 2025
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Page-Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
68
1
0
27 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRL
LRM
57
36
0
26 Mar 2025
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
X. Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
86
13
0
26 Mar 2025
Learning to chain-of-thought with Jensen's evidence lower bound
Yunhao Tang
Sid Wang
Rémi Munos
BDL
OffRL
LRM
50
0
0
25 Mar 2025
Scaling Laws of Synthetic Data for Language Models
Zeyu Qin
Qingxiu Dong
Xingxing Zhang
Li Dong
Xiaolong Huang
...
Hany Awadalla
Yi R. Fung
Weizhu Chen
Minhao Cheng
Furu Wei
SyDa
73
1
0
25 Mar 2025
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
59
0
0
25 Mar 2025
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
85
26
0
25 Mar 2025
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
Yunhao Tang
Kunhao Zheng
Gabriel Synnaeve
Rémi Munos
39
0
0
25 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
48
2
0
25 Mar 2025
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Yuyao Ge
Shenghua Liu
Y. Wang
Lingrui Mei
Lizhe Chen
Baolong Bi
Xueqi Cheng
ReLM
LRM
49
2
0
25 Mar 2025
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training
Han Zhao
Haotian Wang
Yiping Peng
Sitong Zhao
Xiaoyu Tian
Shuaiting Chen
Yunjie Ji
Xiangang Li
RALM
ReLM
LRM
70
8
0
25 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
42
1
0
25 Mar 2025
LogicLearner: A Tool for the Guided Practice of Propositional Logic Proofs
Amogh Inamdar
U. Macar
Michel Vazirani
Michael Tarnow
Zarina Mustapha
Natalia Dittren
Sam Sadeh
Nakul Verma
Ansaf Salleb-Aouissi
LRM
35
0
0
25 Mar 2025
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
E. Ponti
87
0
0
25 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
91
28
0
24 Mar 2025
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
J. Li
Jie Zhou
Yutao Yang
Bihao Zhan
Qianjun Pan
Yuyang Ding
Qin Chen
Jiang Bo
Xin Lin
Liang He
LRM
57
0
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
57
2
0
24 Mar 2025
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin
Lei Ji
Xiao Liu
Yeyun Gong
49
0
0
24 Mar 2025
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li
Rushi Qiang
Lama Moukheiber
Chao Zhang
LRM
46
0
0
24 Mar 2025
Long Is More Important Than Difficult for Training Reasoning Models
Si Shen
Fei Huang
Zhixiao Zhao
C. Liu
Tiansheng Zheng
Danhao Zhu
AIMat
RALM
LRM
57
0
0
23 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
C. Chen
...
Wei Zhang
Z. Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALM
MoE
49
0
0
22 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
40
0
0
22 Mar 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu-Hu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Yu Wu
Chenlin Ming
H. V. Zhao
Conghui He
Lijun Wu
LRM
59
0
0
21 Mar 2025
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song
Mao Zheng
Zheng Li
Wenjie Yang
Xuan Luo
Yue Pan
Feng Zhang
ReLM
LRM
78
4
0
21 Mar 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
54
1
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
74
21
0
20 Mar 2025
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs
Masoud Hashemi
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudhan
Jishnu Sethumadhavan Nair
Aman Tiwari
Vikas Yadav
ReLM
ELM
LRM
61
2
0
20 Mar 2025
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo
Yulin Song
Xingyao Zhang
Jiaheng Liu
Weixun Wang
Gengru Chen
Wenbo Su
Bo Zheng
LRM
58
4
0
20 Mar 2025
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang
Chris Ngo
OffRL
LRM
47
8
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
93
6
0
20 Mar 2025
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li
Nazhou Liu
Kai Yang
38
3
0
20 Mar 2025
Previous
1
2
3
4
5
...
26
27
28
Next