ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08322
  4. Cited By
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for
  Foundation Models

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

15 May 2023
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
Tangjun Su
Junteng Liu
Chuancheng Lv
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
    ELM
    LRM
ArXivPDFHTML

Papers citing "C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models"

50 / 68 papers shown
Title
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
34
0
0
04 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
Yu Zheng
Longyi Liu
Yuming Lin
Jie Feng
Guozhen Zhang
Depeng Jin
Yong Li
ELM
73
0
0
23 Apr 2025
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Dong Wang
ELM
28
0
0
17 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
63
6
1
14 Apr 2025
Large language models could be rote learners
Large language models could be rote learners
Yuyang Xu
Renjun Hu
Haochao Ying
J. Wu
Xing Shi
Wei Lin
ELM
48
0
0
11 Apr 2025
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
Zhijun Wang
Jiahuan Li
Hao Zhou
Rongxiang Weng
J. Wang
Xin Huang
Xue Han
Junlan Feng
Chao Deng
Shujian Huang
LRM
39
1
0
02 Apr 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
100
2
0
07 Mar 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
72
4
0
20 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
J. Zhao
M. Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
53
1
0
18 Feb 2025
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging
Zhixiang Wang
Zhenyu Mao
Yixuan Qiao
Yunfang Wu
Biye Li
MoMe
73
0
0
17 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Z. Li
L. Zhang
P. Wang
49
0
0
17 Feb 2025
Valuable Hallucinations: Realizable Non-realistic Propositions
Valuable Hallucinations: Realizable Non-realistic Propositions
Qiucheng Chen
Bo Wang
LRM
54
0
0
16 Feb 2025
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis
Kaikai Zhao
Zhaoxiang Liu
Xuejiao Lei
Ning Wang
Zhenhong Long
...
W. Liu
Kai Wang
Wen Liu
Kai Wang
Shiguo Lian
ELM
LRM
52
1
0
16 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
88
12
0
14 Feb 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Zhiqiang Zhang
Lei Liang
Jun Zhou
AI4CE
106
0
0
06 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
84
2
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
64
10
0
28 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
106
128
0
22 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
45
5
0
21 Jan 2025
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
Zhengmin Yu
Jiutian Zeng
Siyi Chen
Wenhan Xu
Dandan Xu
Xiangyu Liu
Zonghao Ying
Nan Wang
Yuan Zhang
Min Yang
ELM
108
1
0
20 Jan 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen
Manish Shetty
Gagan Somashekar
Minghua Ma
Yogesh L. Simmhan
Jonathan Mace
Chetan Bansal
Rujia Wang
Saravan Rajmohan
46
0
0
12 Jan 2025
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
Xin Huang
Tarun K. Vangani
Minh Duc Pham
Xunlong Zou
Bin Wang
Zhengyuan Liu
A. Aw
LRM
34
0
0
21 Dec 2024
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Siyuan Wang
Dianyi Wang
Chengxing Zhou
Zejun Li
Zhihao Fan
Xuanjing Huang
Zhongyu Wei
VLM
87
0
0
17 Dec 2024
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
Runming Yang
Taiqiang Wu
Jiahao Wang
Pengfei Hu
Ngai Wong
Yujiu Yang
Yujiu Yang
44
0
0
11 Nov 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
40
4
0
17 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
29
13
0
15 Oct 2024
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
45
1
0
15 Oct 2024
Enabling Real-Time Conversations with Minimal Training Costs
Enabling Real-Time Conversations with Minimal Training Costs
Wang Xu
Shuo Wang
Weilin Zhao
Xu Han
Yukun Yan
Yudi Zhang
Zhe Tao
Zhiyuan Liu
Wanxiang Che
19
4
0
18 Sep 2024
Training on the Benchmark Is Not All You Need
Training on the Benchmark Is Not All You Need
Shiwen Ni
Xiangtao Kong
Chengming Li
Xiping Hu
Ruifeng Xu
Jia Zhu
Min Yang
47
5
0
03 Sep 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
45
1
0
23 Aug 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
...
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Zenan Zhou
ELM
76
16
0
02 Aug 2024
ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis
  with Coarse-to-Fine In-context Learning
ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning
Senbin Zhu
Hanjie Zhao
Xingren Wang
Shanhong Liu
Yuxiang Jia
Hongying Zan
25
1
0
22 Jul 2024
It's Morphing Time: Unleashing the Potential of Multiple LLMs via
  Multi-objective Optimization
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization
Bingdong Li
Zixiang Di
Yanting Yang
Hong Qian
Peng Yang
Hao Hao
Ke Tang
Aimin Zhou
MoMe
19
5
0
29 Jun 2024
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for
  Foundation Models
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Zhi-Long Ji
Jin-Feng Bai
Zhen-Ru Pan
Fan-Hu Zeng
Jian Xu
Jia-Xin Zhang
Cheng-Lin Liu
ELM
23
10
0
28 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
S.
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
85
17
0
23 Jun 2024
The Music Maestro or The Musically Challenged, A Massive Music
  Evaluation Benchmark for Large Language Models
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
Jiajia Li
Lu Yang
Mingni Tang
Cong Chen
Zuchao Li
Ping Wang
Hai Zhao
LM&MA
29
4
0
22 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
37
25
0
18 Jun 2024
What is the best model? Application-driven Evaluation for Large Language
  Models
What is the best model? Application-driven Evaluation for Large Language Models
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALM
ELM
27
2
0
14 Jun 2024
Cracking the Code of Juxtaposition: Can AI Models Understand the
  Humorous Contradictions
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Zhe Hu
Tuo Liang
Jing Li
Yiren Lu
Yunlai Zhou
Yiran Qiao
Jing Ma
Yu Yin
36
4
0
29 May 2024
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Hongyu Wang
Jiayu Xu
Senwei Xie
Ruiping Wang
Jialin Li
Zhaojie Xie
Bin Zhang
Chuyan Xiong
Xilin Chen
ELM
VLM
LRM
83
7
0
24 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
71
136
0
29 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
58
36
0
23 Apr 2024
Evaluating the Factuality of Large Language Models using Large-Scale
  Knowledge Graphs
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
Xiaoze Liu
Feijie Wu
Tianyang Xu
Zhuo Chen
Yichi Zhang
Xiaoqian Wang
Jing Gao
HILM
33
8
0
01 Apr 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
55
46
0
23 Mar 2024
ChatUIE: Exploring Chat-based Unified Information Extraction using Large
  Language Models
ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models
Jun Xu
Mengshu Sun
Zhiqiang Zhang
Jun Zhou
27
1
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
107
490
0
07 Mar 2024
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large
  Language Models
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
Yang Janet Liu
Meng Xu
Shuo Wang
Liner Yang
Haoyu Wang
...
Cunliang Kong
Yun-Nung Chen
Yang Liu
Maosong Sun
Erhong Yang
ELM
LRM
36
1
0
21 Feb 2024
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for
  Chinese Public Security Domain
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
Xin Tong
Bo Jin
Zhi Lin
Binjun Wang
Ting Yu
Qiang Cheng
ELM
17
0
0
11 Feb 2024
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for
  Hallucination Mitigation
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation
Yuxin Liang
Zhuoyang Song
Hao Wang
Jiaxing Zhang
HILM
21
28
0
27 Jan 2024
12
Next