Evaluating Mathematical Reasoning of Large Language Models: A Focus on
Error Identification and Correction

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

2 June 2024

Papers citing "Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction"

9 / 9 papers shown

Title
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Mingyang Song Zhaochen Su Xiaoye Qu Jiawei Zhou Yu-Xi Cheng LRM 43 29 0 06 Jan 2025
Number Cookbook: Number Understanding of Language Models and How to Improve It Haotong Yang Yi Hu Shijia Kang Zhouchen Lin Muhan Zhang LRM 39 2 0 06 Nov 2024
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning An-Zi Yen Wei-Ling Hsu LRM AI4Ed 25 9 0 20 Oct 2023
GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems Kaya Stechly Matthew Marquez Subbarao Kambhampati LRM 155 84 0 19 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 200 2,232 0 22 Mar 2023
Language Models are Multilingual Chain-of-Thought Reasoners Freda Shi Mirac Suzgun Markus Freitag Xuezhi Wang Suraj Srivats ... Yi Tay Sebastian Ruder Denny Zhou Dipanjan Das Jason W. Wei ReLM LRM 165 320 0 06 Oct 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 291 2,712 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 274 882 0 18 Apr 2021