ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,395 papers shown
Title
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs
  Sampling
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
Weijia Xu
Andrzej Banburski-Fahey
Nebojsa Jojic
ReLM
LRM
19
32
0
17 May 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for
  Foundation Models
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
...
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
ELM
LRM
22
494
0
15 May 2023
Algebra Error Classification with Large Language Models
Algebra Error Classification with Large Language Models
Hunter McNichols
Mengxue Zhang
Andrew S. Lan
14
6
0
08 May 2023
Code Execution with Pre-trained Language Models
Code Execution with Pre-trained Language Models
Chenxiao Liu
Shuai Lu
Weizhu Chen
Daxin Jiang
Alexey Svyatkovskiy
Shengyu Fu
Neel Sundaresan
Nan Duan
ELM
20
21
0
08 May 2023
Non-Autoregressive Math Word Problem Solver with Unified Tree Structure
Non-Autoregressive Math Word Problem Solver with Unified Tree Structure
Yi Bin
Meng Han
Wenhao Shi
Lei Wang
Yang Yang
See-Kiong Ng
Heng Tao Shen
AIMat
19
7
0
08 May 2023
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning
  by Large Language Models
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Lei Wang
Wanyu Xu
Yihuai Lan
Zhiqiang Hu
Yunshi Lan
Roy Ka-Wei Lee
Ee-Peng Lim
ReLM
LRM
29
310
0
06 May 2023
Large Language Models for Automated Data Science: Introducing CAAFE for
  Context-Aware Automated Feature Engineering
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
Noah Hollmann
Samuel G. Müller
Frank Hutter
40
51
0
05 May 2023
Inducing anxiety in large language models increases exploration and bias
Inducing anxiety in large language models increases exploration and bias
Julian Coda-Forno
Kristin Witte
A. Jagadish
Marcel Binz
Zeynep Akata
Eric Schulz
AI4CE
15
2
0
21 Apr 2023
Evaluating Transformer Language Models on Arithmetic Operations Using
  Number Decomposition
Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
Matteo Muffo
A. Cocco
Enrico Bertino
ReLM
18
25
0
21 Apr 2023
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
Archiki Prasad
Swarnadeep Saha
Xiang Zhou
Mohit Bansal
LRM
18
43
0
21 Apr 2023
Learning to Plan with Natural Language
Learning to Plan with Natural Language
Yiduo Guo
Yaobo Liang
Chenfei Wu
Wenshan Wu
Dongyan Zhao
Nan Duan
LLMAG
LRM
26
6
0
20 Apr 2023
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Chuanyang Zheng
Zhengying Liu
Enze Xie
Zhenguo Li
Yu Li
LLMAG
ReLM
LRM
19
100
0
19 Apr 2023
Solving Math Word Problems by Combining Language Models With Symbolic
  Solvers
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
Joy He-Yueya
Gabriel Poesia
Rose E. Wang
Noah D. Goodman
22
112
0
16 Apr 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALM
ELM
23
483
0
13 Apr 2023
Boosted Prompt Ensembles for Large Language Models
Boosted Prompt Ensembles for Large Language Models
Silviu Pitis
Michael Ruogu Zhang
Andrew Wang
Jimmy Ba
LRM
LLMAG
16
40
0
12 Apr 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language
  Model Society
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
G. Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Bernard Ghanem
SyDa
ALM
19
397
0
31 Mar 2023
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph
  Databases
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
Hongyu Ren
Mikhail Galkin
Michael Cochez
Zhaocheng Zhu
J. Leskovec
NAI
GNN
34
35
0
26 Mar 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
27
102
0
20 Mar 2023
Mind meets machine: Unravelling GPT-4's cognitive psychology
Mind meets machine: Unravelling GPT-4's cognitive psychology
Sifatkaur Dhingra
Manmeet Singh
Vaisakh S.B.
Neetiraj Malviya
S. Gill
AI4MH
11
41
0
20 Mar 2023
Can neural networks do arithmetic? A survey on the elementary numerical
  skills of state-of-the-art deep learning models
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models
Alberto Testolin
AIMat
25
19
0
14 Mar 2023
Baldur: Whole-Proof Generation and Repair with Large Language Models
Baldur: Whole-Proof Generation and Repair with Large Language Models
E. First
M. Rabe
Talia Ringer
Yuriy Brun
59
92
0
08 Mar 2023
Cost-Effective Hyperparameter Optimization for Large Language Model
  Generation Inference
Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference
Chi Wang
Susan Liu
Ahmed Hassan Awadallah
11
39
0
08 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
8
12,223
0
27 Feb 2023
Safety without alignment
Safety without alignment
András Kornai
M. Bukatin
Zsolt Zombori
LLMSV
11
0
0
27 Feb 2023
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level
  Mathematics
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics
Zhangir Azerbayev
Bartosz Piotrowski
Hailey Schoelkopf
Edward W. Ayers
Dragomir R. Radev
J. Avigad
AIMat
11
66
0
24 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
  Security Attacks
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
6
233
0
11 Feb 2023
The Wisdom of Hindsight Makes Language Models Better Instruction
  Followers
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
Tianjun Zhang
Fangchen Liu
Justin Wong
Pieter Abbeel
Joseph E. Gonzalez
8
43
0
10 Feb 2023
A Categorical Archive of ChatGPT Failures
A Categorical Archive of ChatGPT Failures
Ali Borji
ELM
20
375
0
06 Feb 2023
Mathematical Capabilities of ChatGPT
Mathematical Capabilities of ChatGPT
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
ELM
AI4MH
16
399
0
31 Jan 2023
Towards Autoformalization of Mathematics and Code Correctness:
  Experiments with Elementary Proofs
Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs
Garett Cunningham
Razvan C. Bunescu
D. Juedes
LRM
18
16
0
05 Jan 2023
Biologically Inspired Design Concept Generation Using Generative
  Pre-Trained Transformers
Biologically Inspired Design Concept Generation Using Generative Pre-Trained Transformers
Qihao Zhu
Xinyu Zhang
Jianxi Luo
AI4CE
23
50
0
26 Dec 2022
OPT-IML: Scaling Language Model Instruction Meta Learning through the
  Lens of Generalization
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Srinivasan Iyer
Xi Victoria Lin
Ramakanth Pasunuru
Todor Mihaylov
Daniel Simig
...
Jeff Wang
Christopher Dewan
Asli Celikyilmaz
Luke Zettlemoyer
Veselin Stoyanov
ALM
31
259
0
22 Dec 2022
Language models are better than humans at next-token prediction
Language models are better than humans at next-token prediction
Buck Shlegeris
Fabien Roger
Lawrence Chan
Euan McLean
ELM
LRM
16
11
0
21 Dec 2022
A Survey of Deep Learning for Mathematical Reasoning
A Survey of Deep Learning for Mathematical Reasoning
Pan Lu
Liang Qiu
Wenhao Yu
Sean Welleck
Kai-Wei Chang
ReLM
LRM
32
137
0
20 Dec 2022
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
Seyed Mehran Kazemi
Najoung Kim
Deepti Bhatia
Xinyuan Xu
Deepak Ramachandran
LRM
19
76
0
20 Dec 2022
Towards Reasoning in Large Language Models: A Survey
Towards Reasoning in Large Language Models: A Survey
Jie Huang
Kevin Chen-Chuan Chang
LM&MA
ELM
LRM
19
579
0
20 Dec 2022
Reasoning with Language Model Prompting: A Survey
Reasoning with Language Model Prompting: A Survey
Shuofei Qiao
Yixin Ou
Ningyu Zhang
Xiang Chen
Yunzhi Yao
Shumin Deng
Chuanqi Tan
Fei Huang
Huajun Chen
ReLM
ELM
LRM
49
307
0
19 Dec 2022
Plansformer: Generating Symbolic Plans using Transformers
Plansformer: Generating Symbolic Plans using Transformers
Vishal Pallagani
Bharath Muppasani
K. Murugesan
F. Rossi
L. Horesh
Biplav Srivastava
F. Fabiano
Andrea Loreggia
LM&Ro
LLMAG
OffRL
10
35
0
16 Dec 2022
ALERT: Adapting Language Models to Reasoning Tasks
ALERT: Adapting Language Models to Reasoning Tasks
Ping Yu
Tianlu Wang
O. Yu. Golovneva
Badr AlKhamissi
Siddharth Verma
Zhijing Jin
Gargi Ghosh
Mona T. Diab
Asli Celikyilmaz
ReLM
LRM
25
21
0
16 Dec 2022
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ReLM
LRM
20
137
0
15 Dec 2022
Despite "super-human" performance, current LLMs are unsuited for
  decisions about ethics and safety
Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety
Joshua Albrecht
Ellie Kitanidis
Abraham J. Fetterman
ELM
ReLM
ALM
LRM
14
17
0
13 Dec 2022
Solving math word problems with process- and outcome-based feedback
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaML
ReLM
AIMat
LRM
19
279
0
25 Nov 2022
Program of Thoughts Prompting: Disentangling Computation from Reasoning
  for Numerical Reasoning Tasks
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen
Xueguang Ma
Xinyi Wang
William W. Cohen
ReLM
ReCod
LRM
56
731
0
22 Nov 2022
PAL: Program-aided Language Models
PAL: Program-aided Language Models
Luyu Gao
Aman Madaan
Shuyan Zhou
Uri Alon
Pengfei Liu
Yiming Yang
Jamie Callan
Graham Neubig
ReLM
LRM
29
411
0
18 Nov 2022
Galactica: A Large Language Model for Science
Galactica: A Large Language Model for Science
Ross Taylor
Marcin Kardas
Guillem Cucurull
Thomas Scialom
Anthony Hartshorn
Elvis Saravia
Andrew Poulton
Viktor Kerkez
Robert Stojnic
ELM
ReLM
29
721
0
16 Nov 2022
Teaching Algorithmic Reasoning via In-context Learning
Teaching Algorithmic Reasoning via In-context Learning
Hattie Zhou
Azade Nova
Hugo Larochelle
Aaron C. Courville
Behnam Neyshabur
Hanie Sedghi
LRM
ReLM
30
108
0
15 Nov 2022
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Ippei Fujisawa
Ryota Kanai
ELM
LRM
20
4
0
14 Nov 2022
Development of a Neural Network-Based Mathematical Operation Protocol
  for Embedded Hexadecimal Digits Using Neural Architecture Search (NAS)
Development of a Neural Network-Based Mathematical Operation Protocol for Embedded Hexadecimal Digits Using Neural Architecture Search (NAS)
Victor Robila
Kexin Pei
Junfeng Yang
11
0
0
12 Nov 2022
A Simple, Yet Effective Approach to Finding Biases in Code Generation
A Simple, Yet Effective Approach to Finding Biases in Code Generation
Spyridon Mouselinos
Mateusz Malinowski
Henryk Michalewski
10
7
0
31 Oct 2022
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal
  Proofs
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Albert Q. Jiang
Sean Welleck
Jin Peng Zhou
Wenda Li
Jiacheng Liu
M. Jamnik
Timothée Lacroix
Yuhuai Wu
Guillaume Lample
AIMat
58
157
0
21 Oct 2022
Previous
123...262728
Next