ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,395 papers shown
Title
Guiding Language Model Math Reasoning with Planning Tokens
Guiding Language Model Math Reasoning with Planning Tokens
Xinyi Wang
Lucas Page-Caccia
O. Ostapenko
Xingdi Yuan
William Yang Wang
Alessandro Sordoni
LRM
31
2
0
09 Oct 2023
MuggleMath: Assessing the Impact of Query and Response Augmentation on
  Math Reasoning
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning
Chengpeng Li
Zheng Yuan
Hongyi Yuan
Guanting Dong
Keming Lu
Jiancan Wu
Chuanqi Tan
Xiang Wang
Chang Zhou
LRM
12
21
0
09 Oct 2023
How Abilities in Large Language Models are Affected by Supervised
  Fine-tuning Data Composition
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
Guanting Dong
Hongyi Yuan
Keming Lu
Chengpeng Li
Mingfeng Xue
Dayiheng Liu
Wei Wang
Zheng Yuan
Chang Zhou
Jingren Zhou
LRM
CLL
29
118
0
09 Oct 2023
Do Large Language Models Know about Facts?
Do Large Language Models Know about Facts?
Xuming Hu
Junzhe Chen
Xiaochuan Li
Yufei Guo
Lijie Wen
Philip S. Yu
Zhijiang Guo
HILM
KELM
20
49
0
08 Oct 2023
Talk like a Graph: Encoding Graphs for Large Language Models
Talk like a Graph: Encoding Graphs for Large Language Models
Bahare Fatemi
Jonathan J. Halcrow
Bryan Perozzi
AI4CE
8
93
0
06 Oct 2023
An In-Context Learning Agent for Formal Theorem-Proving
An In-Context Learning Agent for Formal Theorem-Proving
Amitayush Thakur
George Tsoukalas
Yeming Wen
Jimmy Xin
Swarat Chaudhuri
LLMAG
25
22
0
06 Oct 2023
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Wanyun Cui
Qianle Wang
LRM
34
7
0
06 Oct 2023
Analysis of the Reasoning with Redundant Information Provided Ability of
  Large Language Models
Analysis of the Reasoning with Redundant Information Provided Ability of Large Language Models
Wenbei Xie
LRM
14
2
0
06 Oct 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
  Reasoning
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang
Houxing Ren
Aojun Zhou
Zimu Lu
Sichun Luo
Weikang Shi
Renrui Zhang
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLM
LRM
SyDa
22
94
0
05 Oct 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct
  Preference Optimization
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
24
47
0
05 Oct 2023
Concise and Organized Perception Facilitates Reasoning in Large Language Models
Concise and Organized Perception Facilitates Reasoning in Large Language Models
Junjie Liu
Shaotian Yan
Chen Shen
Zhengdong Xiao
Wenxiao Wang
Jieping Ye
Jieping Ye
LRM
8
1
0
05 Oct 2023
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of
  Large Language Models with Misconceptions
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions
Naiming Liu
Shashank Sonkar
Zichao Wang
Simon Woodhead
Richard G. Baraniuk
LRM
AI4Ed
15
14
0
03 Oct 2023
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with
  Agent Team Optimization
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Zijun Liu
Yanzhe Zhang
Peng Li
Yang Janet Liu
Diyi Yang
LLMAG
26
103
0
03 Oct 2023
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology
  View
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View
Jintian Zhang
Xin Xu
Ningyu Zhang
Ruibo Liu
Bryan Hooi
Shumin Deng
LLMAG
30
123
0
03 Oct 2023
Instances Need More Care: Rewriting Prompts for Instances with LLMs in
  the Loop Yields Better Zero-Shot Performance
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance
Saurabh Srivastava
Chengyue Huang
Weiguo Fan
Ziyu Yao
LLMAG
28
4
0
03 Oct 2023
Large Language Models as Analogical Reasoners
Large Language Models as Analogical Reasoners
Michihiro Yasunaga
Xinyun Chen
Yujia Li
Panupong Pasupat
J. Leskovec
Percy Liang
Ed H. Chi
Denny Zhou
ReLM
LRM
21
75
0
03 Oct 2023
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Xi Victoria Lin
Xilun Chen
Mingda Chen
Weijia Shi
Maria Lomeli
...
Jacob Kahn
Gergely Szilvasy
Mike Lewis
Luke Zettlemoyer
Scott Yih
RALM
34
129
0
02 Oct 2023
FELM: Benchmarking Factuality Evaluation of Large Language Models
FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen
Yiran Zhao
Jinghan Zhang
Ethan Chern
Siyang Gao
Pengfei Liu
Junxian He
HILM
14
33
0
01 Oct 2023
LEGO-Prover: Neural Theorem Proving with Growing Libraries
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Haiming Wang
Huajian Xin
Chuanyang Zheng
Lin Li
Zhengying Liu
...
Enze Xie
Jian Yin
Zhenguo Li
Heng Liao
Xiaodan Liang
LRM
39
61
0
01 Oct 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Minlie Huang
Nan Duan
Weizhu Chen
LRM
AI4CE
LLMAG
36
140
0
29 Sep 2023
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
  Toolsets
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan
Yangyi Chen
Xingyao Wang
Yi Ren Fung
Hao Peng
Heng Ji
LLMAG
KELM
21
57
0
29 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
29
1,568
0
28 Sep 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the
  Evolutionary Path towards GPT-4 and Beyond
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
25
23
0
28 Sep 2023
Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models
Wenhan Xiong
Jingyu Liu
Igor Molybog
Hejia Zhang
Prajjwal Bhargava
...
Dániel Baráth
Sergey Edunov
Mike Lewis
Sinong Wang
Hao Ma
29
204
0
27 Sep 2023
Learning the Efficient Frontier
Learning the Efficient Frontier
Philippe Chatigny
Ivan Sergienko
Ryan Ferguson
Jordan Weir
Maxime Bergeron
19
1
0
27 Sep 2023
NLPBench: Evaluating Large Language Models on Solving NLP Problems
NLPBench: Evaluating Large Language Models on Solving NLP Problems
Linxin Song
Jieyu Zhang
Lechao Cheng
Pengyuan Zhou
Tianyi Zhou
Irene Z Li
ELM
LM&MA
LRM
28
10
0
27 Sep 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought
  Reasoning: Advances, Frontiers and Future
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yu Liu
Bing Qin
Ting Liu
LRM
AI4CE
21
149
0
27 Sep 2023
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?
Rui Li
Guoyin Wang
Jiwei Li
LRM
15
12
0
26 Sep 2023
ReConcile: Round-Table Conference Improves Reasoning via Consensus among
  Diverse LLMs
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Justin Chih-Yao Chen
Swarnadeep Saha
Mohit Bansal
LLMAG
LRM
27
118
0
22 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
23
65
0
21 Sep 2023
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
  Models
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
L. Yu
Weisen Jiang
Han Shi
Jincheng Yu
Zhengying Liu
Yu Zhang
James T. Kwok
Zheng Li
Adrian Weller
Weiyang Liu
OSLM
LRM
39
323
0
21 Sep 2023
LPML: LLM-Prompting Markup Language for Mathematical Reasoning
LPML: LLM-Prompting Markup Language for Mathematical Reasoning
Ryutaro Yamauchi
Sho Sonoda
Akiyoshi Sannai
Wataru Kumagai
KELM
LRM
35
15
0
21 Sep 2023
Text2Reward: Reward Shaping with Language Models for Reinforcement
  Learning
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
Tianbao Xie
Siheng Zhao
Chen Henry Wu
Yitao Liu
Qian Luo
Victor Zhong
Yanchao Yang
Tao Yu
LM&Ro
34
48
0
20 Sep 2023
Design of Chain-of-Thought in Math Problem Solving
Design of Chain-of-Thought in Math Problem Solving
Zhanming Jie
Trung Quoc Luong
Xinbo Zhang
Xiaoran Jin
Hang Li
LRM
43
11
0
20 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
125
138
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
66
699
0
19 Sep 2023
OWL: A Large Language Model for IT Operations
OWL: A Large Language Model for IT Operations
Hongcheng Guo
Jian Yang
Jiaheng Liu
Liqun Yang
Linzheng Chai
...
Tieqiao Zheng
Liangfan Zheng
Bo-Wen Zhang
Ke Xu
Zhoujun Li
VLM
66
41
0
17 Sep 2023
Contrastive Decoding Improves Reasoning in Large Language Models
Contrastive Decoding Improves Reasoning in Large Language Models
Sean O'Brien
Mike Lewis
SyDa
LRM
ReLM
15
31
0
17 Sep 2023
Chain-of-Thought Reasoning is a Policy Improvement Operator
Chain-of-Thought Reasoning is a Policy Improvement Operator
Hugh Zhang
David C. Parkes
ReLM
LM&Ro
LRM
14
12
0
15 Sep 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction
  Tuning
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMat
LRM
54
361
0
11 Sep 2023
Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity
  Recognition
Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity Recognition
Michael Beukman
Manuel A. Fokam
11
2
0
11 Sep 2023
FIMO: A Challenge Formal Dataset for Automated Theorem Proving
FIMO: A Challenge Formal Dataset for Automated Theorem Proving
Chengwu Liu
Jianhao Shen
Huajian Xin
Zhengying Liu
Ye Yuan
...
Chuanyang Zheng
Yichun Yin
Lin Li
Ming Zhang
Qun Liu
AIMat
AI4CE
27
31
0
08 Sep 2023
GPT Can Solve Mathematical Problems Without a Calculator
GPT Can Solve Mathematical Problems Without a Calculator
Z. Yang
Ming Ding
Qingsong Lv
Zhihuan Jiang
Zehai He
Yuyi Guo
Jinfeng Bai
Jie Tang
RALM
LRM
26
52
0
06 Sep 2023
AGIBench: A Multi-granularity, Multimodal, Human-referenced,
  Auto-scoring Benchmark for Large Language Models
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Fei Tang
Wanling Gao
Luzhou Peng
Jianfeng Zhan
ELM
14
2
0
05 Sep 2023
No Train Still Gain. Unleash Mathematical Reasoning of Large Language
  Models with Monte Carlo Tree Search Guided by Energy Function
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function
Haotian Xu
LRM
22
6
0
01 Sep 2023
When Do Program-of-Thoughts Work for Reasoning?
When Do Program-of-Thoughts Work for Reasoning?
Zhen Bi
Ningyu Zhang
Yinuo Jiang
Shumin Deng
Guozhou Zheng
Huajun Chen
LRM
14
20
0
29 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
30
4
0
27 Aug 2023
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
ELM
40
69
0
25 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
21
531
0
21 Aug 2023
GameEval: Evaluating LLMs on Conversational Games
GameEval: Evaluating LLMs on Conversational Games
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELM
LLMAG
19
20
0
19 Aug 2023
Previous
123...2425262728
Next