Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,395 papers shown
Title
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models
Liwen Zhang
Wei Cai
Zhaowei Liu
Zhi Yang
Wei Dai
...
Zhiqiang Liu
Zhoufan Zhu
Anbo Wu
Xinnan Guo
Yun Chen
ELM
ALM
25
24
0
19 Aug 2023
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang
Guiming Hardy Chen
Dingjie Song
Zhiyi Zhang
Zhihong Chen
...
Feng Jiang
Jianquan Li
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
ELM
AI4MH
25
77
0
17 Aug 2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAG
AI4CE
48
271
0
16 Aug 2023
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Aojun Zhou
Ke Wang
Zimu Lu
Weikang Shi
Sichun Luo
...
Shaoqing Lu
Anya Jia
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLM
LRM
22
144
0
15 Aug 2023
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Ariel N. Lee
Cole J. Hunter
Nataniel Ruiz
ALM
ObjD
18
134
0
14 Aug 2023
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
K. Lu
Hongyi Yuan
Zheng Yuan
Runji Lin
Junyang Lin
Chuanqi Tan
Chang Zhou
Jingren Zhou
ALM
LRM
27
63
0
14 Aug 2023
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min
Suchin Gururangan
Eric Wallace
Hannaneh Hajishirzi
Noah A. Smith
Luke Zettlemoyer
AILaw
22
63
0
08 Aug 2023
Cumulative Reasoning with Large Language Models
Yifan Zhang
Jingqin Yang
Yang Yuan
Andrew Chi-Chih Yao
ReLM
ELM
LRM
AI4CE
29
67
0
08 Aug 2023
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu
Xukun Liu
Hua Shen
Zeyu Han
Yuhan Li
Murong Yue
Zhi-Ping Peng
Yuchen Liu
Ziyu Yao
Dongkuan Xu
LLMAG
22
19
0
08 Aug 2023
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Ning Miao
Yee Whye Teh
Tom Rainforth
ReLM
LRM
17
109
0
01 Aug 2023
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Jiaao Chen
Xiaoman Pan
Dian Yu
Kaiqiang Song
Xiaoyang Wang
Dong Yu
Jianshu Chen
ReLM
LRM
13
24
0
01 Aug 2023
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias
Itay Itzhak
Gabriel Stanovsky
Nir Rosenfeld
Yonatan Belinkov
19
19
0
01 Aug 2023
Three Bricks to Consolidate Watermarks for Large Language Models
Pierre Fernandez
Antoine Chaffin
Karim Tit
Vivien Chappelier
Teddy Furon
WaLM
9
47
0
26 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELM
LRM
29
37
0
25 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
23
84
0
20 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
514
0
12 Jul 2023
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models
Yuxi Ma
Chi Zhang
Song-Chun Zhu
ELM
ALM
27
8
0
07 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
58
1,496
0
06 Jul 2023
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
Tianwen Wei
Jian Luan
W. Liu
Shuang Dong
B. Wang
ELM
25
30
0
29 Jun 2023
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang
Aidan M. Swope
Alex Gu
Rahul Chalamala
Peiyang Song
Shixing Yu
Saad Godil
R. Prenger
Anima Anandkumar
RALM
12
207
0
27 Jun 2023
ToolQA: A Dataset for LLM Question Answering with External Tools
Yuchen Zhuang
Yue Yu
Kuan-Chieh Jackson Wang
Haotian Sun
Chao Zhang
ELM
LLMAG
14
211
0
23 Jun 2023
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Wayne Xin Zhao
Kun Zhou
Beichen Zhang
Zheng Gong
Zhipeng Chen
...
Ji-Rong Wen
Jing Sha
Shijin Wang
Cong Liu
Guoping Hu
MoE
LRM
42
5
0
19 Jun 2023
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
37
237
0
15 Jun 2023
The ADAIO System at the BEA-2023 Shared Task on Generating AI Teacher Responses in Educational Dialogues
Adaeze Adigwe
Zheng Yuan
ELM
11
4
0
08 Jun 2023
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code
Hannah McLean Babe
S. Nguyen
Yangtian Zi
Arjun Guha
Molly Q. Feldman
Carolyn Jane Anderson
ALM
LRM
37
35
0
07 Jun 2023
Deductive Verification of Chain-of-Thought Reasoning
Z. Ling
Yunhao Fang
Xuanlin Li
Zhiao Huang
Mingu Lee
Roland Memisevic
Hao Su
ReLM
LRM
22
123
0
06 Jun 2023
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
Beichen Zhang
Kun Zhou
Xilin Wei
Wayne Xin Zhao
Jing Sha
Shijin Wang
Ji-Rong Wen
LRM
25
33
0
04 Jun 2023
Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Tianduo Wang
Wei Lu
ReLM
LRM
17
14
0
02 Jun 2023
MathChat: Converse to Tackle Challenging Math Problems with LLM Agents
Yiran Wu
Feiran Jia
Shaokun Zhang
Han-Tai Li
Erkang Zhu
Yue Wang
Y. Lee
Richard Peng
Qingyun Wu
Chi Wang
LLMAG
22
49
0
02 Jun 2023
Decision-Oriented Dialogue for Human-AI Collaboration
Jessy Lin
Nicholas Tomlin
Jacob Andreas
J. Eisner
LLMAG
13
26
0
31 May 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
11
855
0
31 May 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
36
178
0
29 May 2023
Matrix Information Theory for Self-Supervised Learning
Yifan Zhang
Zhi-Hao Tan
Jingqin Yang
Weiran Huang
Yang Yuan
SSL
40
16
0
27 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
33
109
0
26 May 2023
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving
Xueliang Zhao
Wenda Li
Lingpeng Kong
22
28
0
25 May 2023
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models
Daman Arora
H. Singh
Mausam
ELM
LRM
17
49
0
24 May 2023
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems
Marek Kadlcík
Michal Štefánik
Ondřej Sotolář
Vlastimil Martinek
LRM
14
13
0
24 May 2023
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models
Jingyuan Qi
Zhiyang Xu
Ying Shen
Minqian Liu
dingnan jin
Qifan Wang
Lifu Huang
ReLM
LRM
KELM
19
11
0
24 May 2023
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning
Alexander Scarlatos
Andrew S. Lan
OffRL
LRM
21
20
0
23 May 2023
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
Zhiheng Xi
Senjie Jin
Yuhao Zhou
Rui Zheng
Songyang Gao
Tao Gui
Qi Zhang
Xuanjing Huang
ReLM
LRM
25
44
0
23 May 2023
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAG
LRM
42
592
0
23 May 2023
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
Z. Chen
Kun Zhou
Beichen Zhang
Zheng Gong
Wayne Xin Zhao
Ji-Rong Wen
KELM
LRM
19
27
0
23 May 2023
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
Cheng Qian
Chi Han
Yi Ren Fung
Yujia Qin
Zhiyuan Liu
Heng Ji
LRM
13
28
0
23 May 2023
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Y. Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
20
2
0
23 May 2023
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen
Ming Yin
Max W.F. Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
17
117
0
21 May 2023
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
Dao Xuan-Quy
Le Ngoc-Bich
Vo The-Duy
Phan Xuan-Dung
Ngo Bac-Bien
Nguyen Van-Tien
Nguyen Thi-My-Thanh
Nguyen Hong-Phuoc
14
16
0
20 May 2023
LogiCoT: Logical Chain-of-Thought Instruction-Tuning
Hanmeng Liu
Zhiyang Teng
Leyang Cui
Chaoli Zhang
Qiji Zhou
Yue Zhang
LRM
17
22
0
20 May 2023
OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models
Badr AlKhamissi
Siddharth Verma
Ping Yu
Zhijing Jin
Asli Celikyilmaz
Mona T. Diab
LRM
ReLM
20
10
0
19 May 2023
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models
Raj Sanjay Shah
Vijay Marupudi
Reba Koenen
Khushi Bhardwaj
Sashank Varma
19
6
0
18 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
53
1,138
0
17 May 2023
Previous
1
2
3
...
25
26
27
28
Next