Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.10263
Cited By
v1
v2 (latest)
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models
17 May 2023
Chuang Liu
Renren Jin
Yuqi Ren
Linhao Yu
Tianyu Dong
Xia Peng
Shuting Zhang
Jianxiang Peng
Peiyi Zhang
Qingqing Lyu
Xiaowen Su
Qun Liu
Deyi Xiong
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (101★)
Papers citing
"M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models"
23 / 23 papers shown
Title
SID: Benchmarking Guided Instruction Capabilities in STEM Education with a Socratic Interdisciplinary Dialogues Dataset
Mei Jiang
Houping Yue
Bingdong Li
Hao Hao
Ying Qian
Bo Jiang
Aimin Zhou
80
1
0
06 Aug 2025
BnMMLU: Measuring Massive Multitask Language Understanding in Bengali
Saman Sarker Joy
ELM
155
1
0
25 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Wenhan Luo
ELM
831
1
0
04 May 2025
CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Qian-Wen Zhang
Haochen Wang
Fang Li
Siyu An
Lingfeng Qiao
Liangcai Gao
Di Yin
Xing Sun
ELM
AI4Ed
164
2
0
24 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
The Web Conference (WWW), 2024
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
284
6
0
19 Sep 2024
Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights
Xinyu-Chen
Lianzhen-Zhang
LLMAG
AI4CE
362
6
0
14 Jul 2024
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
Dan Shi
Renren Jin
Shangda Wu
Weilong Dong
Xinwei Wu
Deyi Xiong
229
25
0
26 Jun 2024
What is the best model? Application-driven Evaluation for Large Language Models
Natural Language Processing and Chinese Computing (NLPCC), 2024
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALM
ELM
197
4
0
14 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
360
27
0
12 Jun 2024
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience
Haixia Han
Tingyun Li
Shisong Chen
Jie Shi
Chengyu Du
Yanghua Xiao
Jiaqing Liang
Xin Lin
175
14
0
16 Apr 2024
An Improved Traditional Chinese Evaluation Suite for Foundation Model
Zhi Rui Tam
Ya-Ting Pai
Yen-Wei Lee
Jun-Da Chen
Wei-Min Chu
Sega Cheng
Hong-Han Shuai
ELM
443
15
0
04 Mar 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALM
ELM
201
90
0
15 Feb 2024
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Jinchang Hou
Chang Ao
Haihong Wu
Xiangtao Kong
Zhigang Zheng
...
Chengming Li
Xiping Hu
Ruifeng Xu
Shiwen Ni
Min Yang
AI4Ed
ELM
145
7
0
29 Jan 2024
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Fajri Koto
Nurul Aisyah
Jinyan Su
Timothy Baldwin
AI4Ed
LRM
ELM
269
57
0
07 Oct 2023
GameEval: Evaluating LLMs on Conversational Games
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELM
LLMAG
146
32
0
19 Aug 2023
Evaluating the Generation Capabilities of Large Chinese Language Models
AI Open (AO), 2023
Hui Zeng
Jingyuan Xue
Meng Hao
Chen Sun
Bin Ning
Na Zhang
ELM
138
13
0
09 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALM
ELM
295
15
0
09 Aug 2023
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
212
97
0
19 Jul 2023
A Survey on Evaluation of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
700
2,667
0
06 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care
Neural Information Processing Systems (NeurIPS), 2023
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
241
8
0
04 Jul 2023
CMMLU: Measuring massive multitask language understanding in Chinese
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinyan Su
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
424
393
0
15 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALM
ELM
265
73
0
09 Jun 2023
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset
Neural Information Processing Systems (NeurIPS), 2023
Junling Liu
Peilin Zhou
Yining Hua
Dading Chong
Zhongyu Tian
...
Helin Wang
Chenyu You
Zhenhua Guo
Lei Zhu
Michael Lingzhi Li
LM&MA
ELM
408
112
0
05 Jun 2023
1