ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.09212
  4. Cited By
CMMLU: Measuring massive multitask language understanding in Chinese
v1v2 (latest)

CMMLU: Measuring massive multitask language understanding in Chinese

Annual Meeting of the Association for Computational Linguistics (ACL), 2023
15 June 2023
Jinyan Su
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
    ALMELM
ArXiv (abs)PDFHTML

Papers citing "CMMLU: Measuring massive multitask language understanding in Chinese"

50 / 267 papers shown
Title
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhou Chen
Zhiqiang Wei
Yuqi Bai
Xue Xiong
Jianmin Wu
3DV
119
5
0
14 Jun 2025
CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation
CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation
Arnav Yayavaram
Siddharth Yayavaram
Simran Khanuja
Michael Saxon
Graham Neubig
184
0
0
10 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
259
19
0
09 Jun 2025
dots.llm1 Technical Report
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
171
3
0
06 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Jiajun Sun
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Qi Zhang
Xuanjing Huang
ELM
234
2
0
03 Jun 2025
MultiHoax: A Dataset of Multi-hop False-Premise Questions
MultiHoax: A Dataset of Multi-hop False-Premise QuestionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammadamin Shafiei
Hamidreza Saffari
Nafise Sadat Moosavi
LRM
197
0
0
30 May 2025
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional ChineseConference on Fairness, Accountability and Transparency (FAccT), 2025
Hanjia Lyu
Jiebo Luo
Jian Kang
Allison Koenecke
175
6
0
28 May 2025
WiNGPT-3.0 Technical Report
Boqin Zhuang
Chenxiao Song
Huitong Lu
Jiacheng Qiao
Mingqian Liu
...
Xiaoxia Song
Xiangjun Xu
X. Chen
Yaoyao Ma
Y. Gao
LLMAGLM&MALRMAI4MHELM
315
0
0
23 May 2025
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
...
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
ELM
159
1
0
22 May 2025
KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance
KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering PerformanceAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qihuang Zhong
Liang Ding
Xiantao Cai
Juhua Liu
Bo Du
Dacheng Tao
291
1
0
21 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoELRMAI4CE
387
13
0
21 May 2025
Enhancing LLMs via High-Knowledge Data Selection
Enhancing LLMs via High-Knowledge Data SelectionAAAI Conference on Artificial Intelligence (AAAI), 2025
Feiyu Duan
Xuemiao Zhang
Sirui Wang
Haoran Que
Yuqi Liu
Wenge Rong
Xunliang Cai
468
3
0
20 May 2025
S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
Yuanbo Fang
Haoze Sun
Jun Liu
Tao Zhang
Guosheng Dong
Weipeng Chen
Xiaofen Xing
Xiangmin Xu
AuLLMELM
171
1
0
20 May 2025
Learnware of Language Models: Specialized Small Language Models Can Do Big
Learnware of Language Models: Specialized Small Language Models Can Do Big
Zhi-Hao Tan
Zi-Chen Zhao
Hao-Yu Shi
Xin-Yu Zhang
Peng Tan
Yang Yu
Zhi Zhou
294
3
0
19 May 2025
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation
Khanh-Tung Tran
Barry O'Sullivan
Hoang D. Nguyen
ELMLRM
337
3
0
16 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
Bo Shen
Cici
Dawei Zhu
...
Yun Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoEReLMLRMAI4CE
514
37
0
12 May 2025
TeleEval-OS: Performance evaluations of large language models for operations scheduling
TeleEval-OS: Performance evaluations of large language models for operations scheduling
Yanyan Wang
Yingying Wang
Junli Liang
Yin Xu
Yunlong Liu
...
Fei Li
Long Zhao
Kuang Xu
Qi Song
Xiangyang Li
AI4TS
135
0
0
06 May 2025
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
1.0K
0
0
05 May 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao
Yi Shen
Shuming Shi
Jieyun Huang
Z. Chen
Rongjia Du
Siqi Xiao
Jing Zhang
Ning Wang
Shiguo Lian
MQ
506
0
0
05 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Wenhan Luo
ELM
823
1
0
04 May 2025
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Minghao Wu
Weixuan Wang
Sinuo Liu
Huifeng Yin
Xintong Wang
Yu Zhao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
ELM
292
13
0
22 Apr 2025
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
Dieuwke Hupkes
Nikolay Bogoychev
890
11
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
497
720
1
14 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELMLM&MA
234
1
0
13 Apr 2025
Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation
Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation
Owen Patterson
Chee Ng
192
0
0
12 Apr 2025
CARE: Multilingual Human Preference Learning for Cultural Awareness
CARE: Multilingual Human Preference Learning for Cultural Awareness
Geyang Guo
Tarek Naous
Hiromi Wakaki
Yukiko Nishimura
Yuki Mitsufuji
Alan Ritter
Wei Xu
371
1
0
07 Apr 2025
Efficient Evaluation of Large Language Models via Collaborative Filtering
Efficient Evaluation of Large Language Models via Collaborative Filtering
Xu-Xiang Zhong
Chao Yi
Han-Jia Ye
235
0
0
05 Apr 2025
Entropy-Based Block Pruning for Efficient Large Language Models
Entropy-Based Block Pruning for Efficient Large Language Models
Liangwei Yang
Yuhui Xu
Juntao Tan
Doyen Sahoo
Siyang Song
Caiming Xiong
Han Wang
Shelby Heinecke
AAML
182
0
0
04 Apr 2025
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
Xiang Feng
Wentao Jiang
Zengmao Wang
Yong Luo
Pingbo Xu
Baosheng Yu
Hua Jin
Bo Du
Jing Zhang
ELMLRM
302
0
0
03 Apr 2025
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhijun Wang
Jiahuan Li
Hao Zhou
Rongxiang Weng
Jiadong Wang
Xue Han
Xue Han
Junlan Feng
Chao Deng
Xin Huang
LRM
303
9
0
02 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
520
7
0
01 Apr 2025
TIB-STC: A Large-Scale Structured Tibetan Benchmark for Low-Resource Language Modeling
TIB-STC: A Large-Scale Structured Tibetan Benchmark for Low-Resource Language Modeling
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Hao Wang
Yongbin Yu
ALM
275
2
0
24 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
Cai Chen
...
Wei Zhang
Zhenru Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALMMoE
233
7
0
22 Mar 2025
Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation
Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation
Shangqing Zhao
Yuhao Zhou
Yupei Ren
Zhe Chen
Chenghao Jia
Fang Zhe
Zhaogaung Long
Shu Liu
Man Lan
ALMELM
254
1
0
20 Mar 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
Tsz Chung Cheng
Chung Shing Cheng
Chaak Ming Lau
Eugene Tin-Ho Lam
Chun Yat Wong
Hoi On Yu
Cheuk Hei Chong
ELM
214
5
0
16 Mar 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Xiaojian Li
Yongkang Leng
Ruiqing Ding
Hangjie Mo
Shanlin Yang
LRM
171
2
0
15 Mar 2025
TLUE: A Tibetan Language Understanding Evaluation Benchmark
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Fan Gao
Cheng Huang
Nyima Tashi
Xiangxiang Wang
Thupten Tsering
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ELM
506
7
0
15 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRMELM
931
1
0
14 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
Xiusi Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
Chong Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
239
5
0
12 Mar 2025
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao
Jingru Tan
Kaihuan Liang
Jiahao Hu
Jiahao Hu
Jiahao Hu
Yazhe Niu
Ruihao Gong
Dahua Lin
Ningyi Xu
403
1
0
10 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoEALM
334
13
0
07 Mar 2025
Extrapolation Merging: Keep Improving With Extrapolation and Merging
Yiguan Lin
Bin Xu
Yinghao Li
Yang Gao
MoMe
204
2
0
05 Mar 2025
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
Zhenglin Wang
Jialong Wu
Pengfei Li
Yong Jiang
Deyu Zhou
168
0
0
24 Feb 2025
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Tianpeng Li
Qingbin Liu
Tao Zhang
Yuanbo Fang
Zheng Liang
...
Bin Cui
Jianhua Xu
Haoze Sun
Guosheng Dong
Xin Wu
AuLLM
226
46
0
24 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
Haoyang Li
Tao Xie
Baishakhi Ray
250
18
0
23 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsInternational Conference on Computational Linguistics (COLING), 2024
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
232
10
0
20 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
187
1
0
19 Feb 2025
KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan
KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of KazakhstanAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mukhammed Togmanov
Nurdaulet Mukhituly
Diana Turmakhan
Jonibek Mansurov
Maiya Goloburda
...
Nurkhan Laiyk
Alham Fikri Aji
Ekaterina Kochmar
Preslav Nakov
Fajri Koto
ELM
190
8
0
18 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
291
20
0
17 Feb 2025
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jafar Isbarov
Arofat Akhundjanova
Mammad Hajili
Kavsar Huseynova
Dmitry Gaynullin
...
Amina Alisheva
Aizirek Turdubaeva
Abdullatif Köksal
Samir Rustamov
Duygu Ataman
ELM
219
5
0
16 Feb 2025
Previous
123456
Next