ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.10263
  4. Cited By
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark
  for Chinese Large Language Models
v1v2 (latest)

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

17 May 2023
Chuang Liu
Renren Jin
Yuqi Ren
Linhao Yu
Tianyu Dong
Xia Peng
Shuting Zhang
Jianxiang Peng
Peiyi Zhang
Qingqing Lyu
Xiaowen Su
Qun Liu
Deyi Xiong
    ELMALM
ArXiv (abs)PDFHTMLGithub (101★)

Papers citing "M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models"

23 / 23 papers shown
Title
SID: Benchmarking Guided Instruction Capabilities in STEM Education with a Socratic Interdisciplinary Dialogues Dataset
SID: Benchmarking Guided Instruction Capabilities in STEM Education with a Socratic Interdisciplinary Dialogues Dataset
Mei Jiang
Houping Yue
Bingdong Li
Hao Hao
Ying Qian
Bo Jiang
Aimin Zhou
80
1
0
06 Aug 2025
BnMMLU: Measuring Massive Multitask Language Understanding in Bengali
BnMMLU: Measuring Massive Multitask Language Understanding in Bengali
Saman Sarker Joy
ELM
155
1
0
25 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Wenhan Luo
ELM
831
1
0
04 May 2025
CJEval: A Benchmark for Assessing Large Language Models Using Chinese
  Junior High School Exam Data
CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Qian-Wen Zhang
Haochen Wang
Fang Li
Siyu An
Lingfeng Qiao
Liangcai Gao
Di Yin
Xing Sun
ELMAI4Ed
164
2
0
24 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsThe Web Conference (WWW), 2024
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4EdELM
284
6
0
19 Sep 2024
Revolutionizing Bridge Operation and maintenance with LLM-based Agents:
  An Overview of Applications and Insights
Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights
Xinyu-Chen
Lianzhen-Zhang
LLMAGAI4CE
362
6
0
14 Jul 2024
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying
  and Reweighting Context-Aware Neurons
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
Dan Shi
Renren Jin
Shangda Wu
Weilong Dong
Xinwei Wu
Deyi Xiong
229
25
0
26 Jun 2024
What is the best model? Application-driven Evaluation for Large Language
  Models
What is the best model? Application-driven Evaluation for Large Language ModelsNatural Language Processing and Chinese Computing (NLPCC), 2024
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALMELM
197
4
0
14 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
360
27
0
12 Jun 2024
Enhancing Confidence Expression in Large Language Models Through
  Learning from Past Experience
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience
Haixia Han
Tingyun Li
Shisong Chen
Jie Shi
Chengyu Du
Yanghua Xiao
Jiaqing Liang
Xin Lin
175
14
0
16 Apr 2024
An Improved Traditional Chinese Evaluation Suite for Foundation Model
An Improved Traditional Chinese Evaluation Suite for Foundation Model
Zhi Rui Tam
Ya-Ting Pai
Yen-Wei Lee
Jun-Da Chen
Wei-Min Chu
Sega Cheng
Hong-Han Shuai
ELM
443
15
0
04 Mar 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative
  Artificial Intelligence
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALMELM
201
90
0
15 Feb 2024
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for
  Large Language Models
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jinchang Hou
Chang Ao
Haihong Wu
Xiangtao Kong
Zhigang Zheng
...
Chengming Li
Xiping Hu
Ruifeng Xu
Shiwen Ni
Min Yang
AI4EdELM
145
7
0
29 Jan 2024
Large Language Models Only Pass Primary School Exams in Indonesia: A
  Comprehensive Test on IndoMMLU
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLUConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Fajri Koto
Nurul Aisyah
Jinyan Su
Timothy Baldwin
AI4EdLRMELM
277
57
0
07 Oct 2023
GameEval: Evaluating LLMs on Conversational Games
GameEval: Evaluating LLMs on Conversational Games
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELMLLMAG
146
32
0
19 Aug 2023
Evaluating the Generation Capabilities of Large Chinese Language Models
Evaluating the Generation Capabilities of Large Chinese Language ModelsAI Open (AO), 2023
Hui Zeng
Jingyuan Xue
Meng Hao
Chen Sun
Bin Ning
Na Zhang
ELM
138
13
0
09 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
CLEVA: Chinese Language Models EVAluation PlatformConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALMELM
295
15
0
09 Aug 2023
CValues: Measuring the Values of Chinese Large Language Models from
  Safety to Responsibility
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALMELM
212
97
0
19 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
700
2,667
0
06 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity
  and Infant Care
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant CareNeural Information Processing Systems (NeurIPS), 2023
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
241
8
0
04 Jul 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in ChineseAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinyan Su
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALMELM
424
393
0
15 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
  Evaluation
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALMELM
265
73
0
09 Jun 2023
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese
  Medical Exam Dataset
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetNeural Information Processing Systems (NeurIPS), 2023
Junling Liu
Peilin Zhou
Yining Hua
Dading Chong
Zhongyu Tian
...
Helin Wang
Chenyu You
Zhenhua Guo
Lei Zhu
Michael Lingzhi Li
LM&MAELM
408
112
0
05 Jun 2023
1