Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16160
Cited By
v1
v2
v3 (latest)
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios
22 May 2025
Bin Xu
Yu Bai
Huashan Sun
Yiguan Lin
Siming Liu
Xinyue Liang
Yaolin Li
Yang Gao
Heyan Huang
AI4Ed
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios"
21 / 21 papers shown
Title
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
201
54
0
03 Apr 2025
Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education
Seungyoon Kim
Seungone Kim
AI4Ed
90
1
0
24 Jul 2024
Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation
Tianyu Wang
Nianjun Zhou
Zhixiong Chen
106
11
0
07 Jul 2024
Simulating Classroom Education with LLM-Empowered Agents
Zheyuan Zhang
Daniel Zhang-Li
Jifan Yu
Linlu Gong
Jinchang Zhou
Zhiyuan Liu
Lei Hou
Juanzi Li
LLMAG
80
65
0
27 Jun 2024
Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions
Hamdireza Rouzegar
Masoud Makrehchi
54
8
0
20 Jun 2024
Generating Educational Materials with Different Levels of Readability using LLMs
Chieh-Yang Huang
Jing Wei
Ting-Hao 'Kenneth' Huang
168
10
0
18 Jun 2024
Evaluating Contextually Personalized Programming Exercises Created with Generative AI
E. Logacheva
Arto Hellas
James Prather
Sami Sarsa
Juho Leinonen
116
11
0
11 Jun 2024
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
Zhuoxuan Jiang
Haoyuan Peng
Shanshan Feng
Fan Li
Dongsheng Li
KELM
LRM
91
16
0
09 May 2024
Large Language Models for Education: A Survey and Outlook
Shen Wang
Tianlong Xu
Hang Li
Chaoli Zhang
Joleen Liang
Jiliang Tang
Philip S. Yu
Qingsong Wen
AI4Ed
114
121
0
26 Mar 2024
Large Language Models in Education: Vision and Opportunities
Wensheng Gan
Zhenlian Qi
Jiayang Wu
Chun-Wei Lin
AI4Ed
124
84
0
22 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
148
737
0
20 Nov 2023
Learning gain differences between ChatGPT and human tutor generated algebra hints
Z. Pardos
Shreya Bhandari
63
113
0
14 Feb 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
233
1,648
0
15 Dec 2022
Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book
Stephen MacNeil
Andrew Tran
Arto Hellas
Joanne Kim
Sami Sarsa
Paul Denny
Seth Bernstein
Juho Leinonen
104
190
0
04 Nov 2022
Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask
Bilal Ghanem
Lauren Lutz Coleman
Julia Rivard Dexter
Spencer McIntosh von der Ohe
Alona Fyshe
AI4Ed
54
32
0
06 Apr 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
417
4,606
0
27 Oct 2021
EQG-RACE: Examination-Type Question Generation
Xin Jia
Wenjie Zhou
Xu Sun
Yunfang Wu
AI4Ed
67
40
0
11 Dec 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
207
4,580
0
07 Sep 2020
SuperGlue: Learning Feature Matching with Graph Neural Networks
Paul-Edouard Sarlin
Daniel DeTone
Tomasz Malisiewicz
Andrew Rabinovich
3DPC
OffRL
178
1,957
0
26 Nov 2019
A Multi-language Platform for Generating Algebraic Mathematical Word Problems
Vijini Liyanage
Surangika Ranathunga
84
8
0
19 Nov 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.2K
7,210
0
20 Apr 2018
1