Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,395 papers shown
Title
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
J. Li
Lu Yu
Qing Cui
Zhiqiang Zhang
Jun Zhou
Yanfang Ye
Chuxu Zhang
59
0
0
19 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
42
1
0
19 Mar 2025
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer
Honglin Lin
Zhuoshi Pan
Yu-Hu Li
Qizhi Pei
Xin Gao
Mengzhang Cai
Conghui He
Lijun Wu
OffRL
LRM
53
0
0
19 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
89
2
0
18 Mar 2025
COPA: Comparing the Incomparable to Explore the Pareto Front
Adrián Javaloy
Antonio Vergari
Isabel Valera
62
0
0
18 Mar 2025
The KoLMogorov Test: Compression by Code Generation
Ori Yoran
Kunhao Zheng
Fabian Gloeckle
Jonas Gehring
Gabriel Synnaeve
Taco Cohen
62
1
0
18 Mar 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Xinyu Fang
Z. Chen
Kai Lan
Lixin Ma
Shengyuan Ding
...
Zicheng Zhang
Guofeng Zhang
Haodong Duan
K. Chen
D. Lin
MLLM
58
1
0
18 Mar 2025
Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM
Yazeed Alnumay
Alexandre Barbet
Anna Bialas
William Darling
Shaan Desai
...
Stephanie Howe
Olivia Lasche
Justin Lee
Anirudh Shrinivason
Jennifer Tracey
86
0
0
18 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
53
0
0
18 Mar 2025
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
Huy Hoang Ha
ReLM
LRM
66
1
0
17 Mar 2025
DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective
Dengyun Peng
Yuhang Zhou
Qiguang Chen
Jinhao Liu
Jingjing Chen
L. Qin
50
0
0
17 Mar 2025
ϕ
ϕ
ϕ
-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
Fangzhi Xu
Hang Yan
Chang Ma
Haiteng Zhao
Jun Liu
Qika Lin
Zhiyong Wu
44
2
0
17 Mar 2025
Measuring In-Context Computation Complexity via Hidden State Prediction
Vincent Herrmann
Róbert Csordás
Jürgen Schmidhuber
39
0
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
88
2
0
17 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
44
2
0
17 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
152
0
0
16 Mar 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
Tsz Chung Cheng
Chung Shing Cheng
Chaak Ming Lau
Eugene Tin-Ho Lam
Chun Yat Wong
Hoi On Yu
Cheuk Hei Chong
ELM
59
1
0
16 Mar 2025
RaSA: Rank-Sharing Low-Rank Adaptation
Zhiwei He
Zhaopeng Tu
Xing Wang
Xingyu Chen
Z. Wang
Jiahao Xu
Tian Liang
Wenxiang Jiao
Z. Zhang
Rui Wang
ALM
82
1
0
16 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
A. Vosoughi
C. L. P. Chen
Chenliang Xu
LRM
64
1
0
14 Mar 2025
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
LRM
41
0
0
14 Mar 2025
GNNs as Predictors of Agentic Workflow Performances
Y. Zhang
Yuchen Hou
Bohan Tang
Shuo Chen
Muhan Zhang
Xiaowen Dong
S. Chen
LLMAG
AI4CE
60
0
0
14 Mar 2025
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
S. M. I. Simon X. Yang
C. Wang
Yidong Wang
Xiaotao Gu
Minlie Huang
J. Tang
LRM
LLMAG
59
0
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
35
1
0
13 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
54
23
1
13 Mar 2025
"Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding
Hyunbin Jin
Je Won Yeom
Seunghyun Bae
Taesup Kim
LRM
ReLM
37
1
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
60
10
0
13 Mar 2025
Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models
Afrar Jahin
Arif Hassan Zidan
Yu Bao
Shizhe Liang
T. Liu
W. Zhang
LRM
61
1
0
13 Mar 2025
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model
Bowen Zhang
Pengcheng Luo
LRM
AI4CE
LLMAG
68
1
0
13 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Ziyu Wan
Yunxiang Li
Y. Song
Hanjing Wang
Linyi Yang
Mark W. Schmidt
J. Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
84
6
0
12 Mar 2025
MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions
Zhe Xu
Daoyuan Chen
Zhenqing Ling
Yaliang Li
Ying Shen
ReLM
SyDa
LRM
51
0
0
12 Mar 2025
Reinforcement Learning is all You Need
Yongsheng Lian
ReLM
OffRL
LRM
70
0
0
12 Mar 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng
Yizhong Wang
Hannaneh Hajishirzi
Pang Wei Koh
ELM
53
3
0
11 Mar 2025
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness
Ce Guo
Tong Zhao
61
1
0
11 Mar 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
LRM
ReLM
62
6
0
11 Mar 2025
RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware
Gonzalo Santamaría Gómez
Guillem García Subies
Pablo Gutiérrez Ruiz
Mario González Valero
Natàlia Fuertes
...
Nuria Aldama García
David Betancur Sánchez
Kateryna Sushkova
Marta Guerrero Nieto
Á. Jiménez
51
0
0
11 Mar 2025
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
Minjun Zhu
Yixuan Weng
Linyi Yang
Yue Zhang
ALM
LRM
58
2
0
11 Mar 2025
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Runxi Cheng
Feng Xiong
Yongxian Wei
Wanyun Zhu
Chun Yuan
MoMe
59
0
0
11 Mar 2025
Dynamic Path Navigation for Motion Agents with LLM Reasoning
Yubo Zhao
Qi Wu
Yifan Wang
Yu-Wing Tai
Chi-Keung Tang
LRM
LLMAG
107
0
0
10 Mar 2025
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
Huilin Deng
Ding Zou
Rui Ma
Hongchen Luo
Yang Cao
Yu Kang
LRM
VLM
52
8
0
10 Mar 2025
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko
Tianyi Chen
Sungnyun Kim
Tianyu Ding
Luming Liang
Ilya Zharkov
Se-Young Yun
VLM
101
0
0
10 Mar 2025
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
Bo Jiang
Shaoyu Chen
Qian Zhang
Wenyu Liu
Xinggang Wang
OffRL
LRM
VLM
71
2
0
10 Mar 2025
Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu
Zihan Wang
Zichen Zhu
Lei Pan
Xingyu Chen
L. Chen
Kai Yu
47
0
0
09 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Y. Liu
Jin Jiang
M. Zhang
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
K. Zhang
ELM
92
0
0
09 Mar 2025
Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform
C. Huang
Peng Ye
X. Wang
Shenghe Zheng
Biqing Qi
Lei Bai
Wanli Ouyang
Tao Chen
31
0
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
68
0
0
08 Mar 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu
Haohao Gao
Renshou Wu
Shuai Ren
Xiaoxin Chen
Hongsheng Li
Fangyuan Li
ELM
49
0
0
08 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
Liang Lin
LRM
54
2
0
08 Mar 2025
Speculative Decoding for Multi-Sample Inference
Yiwei Li
Jiayi Shi
Shaoxiong Feng
Peiwen Yuan
X. Wang
...
Ji Zhang
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
LRM
38
0
0
07 Mar 2025
Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning
Jiachun Li
Pengfei Cao
Yubo Chen
Jiexin Xu
Huaijun Li
Xiaojian Jiang
Kang Liu
Jun Zhao
LRM
44
0
0
07 Mar 2025
Previous
1
2
3
4
5
6
...
26
27
28
Next