ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.09296
20
42

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

15 June 2023
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
Xin Lv
Hao Peng
Zijun Yao
Xiaohan Zhang
Hanming Li
Chun-yan Li
Zheyuan Zhang
Yushi Bai
Yantao Liu
Amy Xin
Nianyi Lin
Kaifeng Yun
Linlu Gong
Jianhui Chen
Zhili Wu
Y. Qi
Weikai Li
Yong Guan
Kaisheng Zeng
Ji Qi
Hailong Jin
Jinxin Liu
Yu Gu
Yuan Yao
Ning Ding
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
    ELM
    ALM
ArXivPDFHTML
Abstract

The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For \textbf{ability modeling}, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering 191919 tasks. (2) For \textbf{data}, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For \textbf{evaluation criteria}, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate 282828 open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.

View on arXiv
Comments on this paper