Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,395 papers shown
Title
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
84
2
0
14 Apr 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Aryan Shrivastava
Paula Akemi Aoyagui
29
0
0
14 Apr 2025
Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Zuoli Tang
Junjie Ou
Kaiqin Hu
Chunwei Wu
Zhaoxin Huan
Chilin Fu
Xiaolu Zhang
Jun Zhou
Chenliang Li
ReLM
LRM
35
0
0
13 Apr 2025
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
Yutao Mou
Yuxiao Luo
Shikun Zhang
Wei Ye
LLMSV
LRM
36
0
0
13 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELM
LM&MA
55
0
0
13 Apr 2025
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Chenghao Li
Chaoning Zhang
Yi Lu
J. Zhang
Qigan Sun
X. Wang
Jiwei Wei
Guoqing Wang
Yang Yang
H. Shen
LRM
60
1
0
13 Apr 2025
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
27
0
0
13 Apr 2025
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Xin Gao
Qizhi Pei
Zinan Tang
Y. Li
Honglin Lin
Jiang Wu
C. He
Lijun Wu
SyDa
28
0
0
11 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Neil He
Jiahong Liu
Buze Zhang
N. Bui
Ali Maatouk
Menglin Yang
Irwin King
Melanie Weber
Rex Ying
29
0
0
11 Apr 2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
FangZhi Xu
Hang Yan
Chang Ma
Haiteng Zhao
Qiushi Sun
Kanzhi Cheng
Junxian He
Jun Liu
Zhiyong Wu
LRM
24
1
0
11 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Y. Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
30
2
0
10 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
64
2
0
10 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
C. Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
72
1
0
10 Apr 2025
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models
Ling Team
Caizhi Tang
Chilin Fu
Chunwei Wu
Jia Guo
...
Shuaicheng Li
Y. Zhang
Yingting Wu
Y. Liu
Zhenyu Huang
LRM
19
0
0
09 Apr 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan
Ming Li
Lichao Sun
Tianyi Zhou
LRM
51
2
0
09 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
74
4
0
09 Apr 2025
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Erik Cambria
Leslie Teo
36
11
0
08 Apr 2025
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
C. Xu
Wei Ping
P. Xu
Z. Liu
Boxin Wang
M. Shoeybi
Bo Li
Bryan Catanzaro
17
1
0
08 Apr 2025
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
N. Mudur
Hao Cui
Subhashini Venugopalan
Paul Raccuglia
M. Brenner
Peter C. Norgaard
LLMAG
ELM
LRM
38
0
0
08 Apr 2025
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Will Cai
Tianneng Shi
Xuandong Zhao
Dawn Song
26
0
0
07 Apr 2025
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen
Zhenyu (Allen) Zhang
Junyuan Hong
Souvik Kundu
Zhangyang Wang
OffRL
LRM
47
2
0
07 Apr 2025
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
34
2
0
07 Apr 2025
A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization
Wenyuan Xu
Xiaochen Zuo
Chao Xin
Yu Yue
Lin Yan
Yonghui Wu
OffRL
14
1
0
07 Apr 2025
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Minki Kang
Jongwon Jeong
Jaewoong Cho
ALM
LRM
41
2
0
07 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Tianyi Zhou
Jieyu Zhao
LRM
76
1
0
07 Apr 2025
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Fan Nie
Lan Feng
Haotian Ye
Weixin Liang
Pan Lu
Huaxiu Yao
Alexandre Alahi
James Zou
78
0
0
07 Apr 2025
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLM
OffRL
LRM
49
3
0
07 Apr 2025
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation
J. S. Park
J. Park
Dongju Jang
Jiwan Chung
Byungwoo Yoo
Jaewoo Shin
S. Park
Taehyeong Kim
Youngjae Yu
41
0
0
04 Apr 2025
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance
Chen Hu
Timothy Neate
Shan Luo
Letizia Gionfrida
39
0
0
04 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
42
0
0
04 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
29
0
0
04 Apr 2025
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Gijs Bellaard
B. Smets
R. Duits
59
0
0
04 Apr 2025
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
H. Le
Dai Do
D. Nguyen
Svetha Venkatesh
OffRL
LRM
36
0
0
03 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
46
9
0
03 Apr 2025
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
Andres Algaba
Vincent Holst
Floriano Tori
Melika Mobini
Brecht Verbeken
Sylvia Wenmackers
Vincent Ginis
33
1
0
03 Apr 2025
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou
Zengzhi Wang
Nikhil Ranjan
Zhoujun Cheng
Liping Tang
Guowei He
Zhengzhong Liu
Eric P. Xing
LRM
38
1
0
03 Apr 2025
Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs
Khanh-Tung Tran
Barry O’Sullivan
Hoang D. Nguyen
LRM
32
0
0
02 Apr 2025
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Jihyun Janice Ahn
Wenpeng Yin
SILM
LRM
56
1
0
02 Apr 2025
YourBench: Easy Custom Evaluation Sets for Everyone
S. Kamath S
Clémentine Fourrier
Alina Lozovskia
Thomas Wolf
Gökhan Tür
Dilek Hakkani-Tür
30
1
0
02 Apr 2025
Adaptive Rectification Sampling for Test-Time Compute Scaling
Zhendong Tan
Xingjun Zhang
Chaoyi Hu
Yancheng Pan
Shaoxun Wang
LRM
31
0
0
02 Apr 2025
AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems
Y. Yang
Huacan Chai
Shuai Shao
Y. Song
Siyuan Qi
Renting Rui
Weinan Zhang
AIFin
41
0
0
01 Apr 2025
Z1: Efficient Test-time Scaling with Code
Zhaojian Yu
Yinghao Wu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
37
1
0
01 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
66
0
0
01 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
52
1
0
01 Apr 2025
HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents
Shiyi Liu
Haiying Shen
Shuai Che
Mahdi Ghandi
Mingqin Li
LLMAG
48
0
0
01 Apr 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi
Hritik Bansal
Arian Hosseini
Aditya Grover
Kai-Wei Chang
Marcus Rohrbach
Anna Rohrbach
OffRL
LRM
37
0
0
01 Apr 2025
Hawkeye:Efficient Reasoning with Model Collaboration
Jianshu She
Z. Li
Zhemin Huang
Qi Li
Peiran Xu
Haonan Li
Qirong Ho
LRM
56
1
0
01 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Z. Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
90
3
0
01 Apr 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
38
0
0
31 Mar 2025
DebFlow: Automating Agent Creation via Agent Debate
Jinwei Su
Yinghui Xia
Ronghua Shi
Jianhui Wang
Jianuo Huang
Y. Wang
Tianyu Shi
Yang Jingsong
Lewei He
30
0
0
31 Mar 2025
Previous
1
2
3
4
5
6
...
26
27
28
Next