Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.05692
Cited By
Evaluating Mathematical Reasoning Beyond Accuracy
8 April 2024
Shijie Xia
Xuefeng Li
Yixin Liu
Tongshuang Wu
Pengfei Liu
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Mathematical Reasoning Beyond Accuracy"
16 / 16 papers shown
Title
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Hao-Lun Sun
Zile Qiao
Jiayan Guo
Xuanbo Fan
Yingyan Hou
Yong-feng Jiang
Pengjun Xie
Fei Huang
Yan Zhang
OffRL
45
0
0
07 May 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan
Ming Li
Lichao Sun
Tianyi Zhou
LRM
39
2
0
09 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
35
1
0
01 Apr 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
37
0
0
22 Mar 2025
Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving
Priscylla Silva
Evandro Costa
LRM
34
0
0
18 Mar 2025
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
S. M. I. Simon X. Yang
C. Wang
Yidong Wang
Xiaotao Gu
Minlie Huang
J. Tang
LRM
LLMAG
56
0
0
13 Mar 2025
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Ketan More
Omkar Thawakar
Ritesh Thawkar
...
F. Khan
Hisham Cholakkal
Ivan Laptev
Rao Muhammad Anwer
Salman Khan
LRM
59
0
0
13 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
36
1
0
11 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
K. Zhang
ELM
48
0
0
09 Mar 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
J. Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Z. Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
60
6
0
26 Feb 2025
Evaluating Robustness of Reward Models for Mathematical Reasoning
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Jungsoo Won
Dongha Lee
Jinyoung Yeo
23
3
0
02 Oct 2024
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
Huanxuan Liao
Shizhu He
Yao Xu
Yuanzhe Zhang
Kang Liu
Jun Zhao
LRM
42
3
0
20 Sep 2024
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
72
3
0
19 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
26
25
0
18 Jun 2024
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions
Pengfei Hong
Navonil Majumder
Deepanway Ghosal
Somak Aditya
Rada Mihalcea
Soujanya Poria
LRM
20
3
0
17 Jan 2024
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
270
0
03 Oct 2022
1