ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.16614
42
2

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

23 February 2025
Alexander Zhang
Marcus Dong
J. H. Liu
W. Zhang
Yejie Wang
Jian Yang
Ge Zhang
T. Liu
Zhongyuan Peng
Yingshui Tan
Y. Zhang
Z. Wang
Weixun Wang
Yancheng He
K. Deng
Wangchunshu Zhou
Wenhao Huang
Z. Zhang
    LRM
ArXivPDFHTML
Abstract

The critique capacity of Large Language Models (LLMs) is essential for reasoning abilities, which can provide necessary suggestions (e.g., detailed analysis and constructive feedback). Therefore, how to evaluate the critique capacity of LLMs has drawn great attention and several critique benchmarks have been proposed. However, existing critique benchmarks usually have the following limitations: (1). Focusing on diverse reasoning tasks in general domains and insufficient evaluation on code tasks (e.g., only covering code generation task), where the difficulty of queries is relatively easy (e.g., the code queries of CriticBench are from Humaneval and MBPP). (2). Lacking comprehensive evaluation from different dimensions. To address these limitations, we introduce a holistic code critique benchmark for LLMs called CodeCriticBench. Specifically, our CodeCriticBench includes two mainstream code tasks (i.e., code generation and code QA) with different difficulties. Besides, the evaluation protocols include basic critique evaluation and advanced critique evaluation for different characteristics, where fine-grained evaluation checklists are well-designed for advanced settings. Finally, we conduct extensive experimental results of existing LLMs, which show the effectiveness of CodeCriticBench.

View on arXiv
@article{zhang2025_2502.16614,
  title={ CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models },
  author={ Alexander Zhang and Marcus Dong and Jiaheng Liu and Wei Zhang and Yejie Wang and Jian Yang and Ge Zhang and Tianyu Liu and Zhongyuan Peng and Yingshui Tan and Yuanxing Zhang and Zhexu Wang and Weixun Wang and Yancheng He and Ken Deng and Wangchunshu Zhou and Wenhao Huang and Zhaoxiang Zhang },
  journal={arXiv preprint arXiv:2502.16614},
  year={ 2025 }
}
Comments on this paper