ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.01861
  4. Cited By
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on
  Class-level Code Generation

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

3 August 2023
Xueying Du
Mingwei Liu
Kaixin Wang
Hanlin Wang
Junwei Liu
Yixuan Chen
Jiayi Feng
Chaofeng Sha
Xin Peng
Yiling Lou
    ELM
    ALM
ArXivPDFHTML

Papers citing "ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation"

20 / 20 papers shown
Title
YABLoCo: Yet Another Benchmark for Long Context Code Generation
YABLoCo: Yet Another Benchmark for Long Context Code Generation
Aidar Valeev
Roman Garaev
Vadim Lomshakov
Irina Piontkovskaya
Vladimir Ivanov
Israel Adewuyi
38
0
0
07 May 2025
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
Yunseo Lee
John Youngeun Song
Dongsun Kim
Jindae Kim
Mijung Kim
Jaechang Nam
HILM
LRM
33
0
0
29 Apr 2025
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li
Xin Zhang
Zhongxin Guo
Shaoguang Mao
Wen Luo
Guangyue Peng
Yangyu Huang
Houfeng Wang
Scarlett Li
57
0
0
09 Mar 2025
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
Peiding Wang
L. Zhang
Fang Liu
Lin Shi
Minxiao Li
Bo Shen
An Fu
ELM
LRM
65
0
0
05 Mar 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
76
6
0
28 Jan 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
114
13
0
20 Jan 2025
The importance of visual modelling languages in generative software engineering
The importance of visual modelling languages in generative software engineering
Roberto Rossi
74
1
0
27 Nov 2024
MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification
MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification
Yin Li
Liangwei Wang
Shiyuan Piao
Boo-Ho Yang
Ziyue Li
Wei Zeng
Fugee Tsung
23
0
0
19 Oct 2024
Balancing Continuous Pre-Training and Instruction Fine-Tuning:
  Optimizing Instruction-Following in LLMs
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs
Ishan Jindal
Chandana Badrinath
Pranjal Bharti
Lakkidi Vinay
Sachin Dev Sharma
CLL
ALM
26
2
0
14 Oct 2024
Automatic Detection of LLM-generated Code: A Case Study of Claude 3
  Haiku
Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku
Musfiqur Rahman
SayedHassan Khatoonabadi
Ahmad Abdellatif
Emad Shihab
26
3
0
02 Sep 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
66
4
0
08 Jul 2024
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Yuxuan Wan
Chaozheng Wang
Yi Dong
Wenxuan Wang
Shuqing Li
Yintong Huo
M. Lyu
3DV
66
10
0
24 Jun 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
35
6
0
12 Apr 2024
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Yao Wan
Yang He
Zhangqian Bi
Jianguo Zhang
Hongyu Zhang
Yulei Sui
Guandong Xu
Hai Jin
Philip S. Yu
20
20
0
30 Dec 2023
Inherent limitations of LLMs regarding spatial information
Inherent limitations of LLMs regarding spatial information
He Yan
Xinyao Hu
Xiangpeng Wan
Chengyu Huang
Kai Zou
Shiqi Xu
LRM
28
2
0
05 Dec 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
178
780
0
02 May 2023
Multi-lingual Evaluation of Code Generation Models
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun
Sanjay Krishna Gouda
Zijian Wang
Xiaopeng Li
Yuchen Tian
...
Baishakhi Ray
Parminder Bhatia
Sudipta Sengupta
Dan Roth
Bing Xiang
ELM
112
117
0
26 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
242
1,070
0
05 Oct 2022
A Systematic Evaluation of Large Language Models of Code
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
202
624
0
26 Feb 2022
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
614
0
20 May 2021
1