ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.00566
  4. Cited By
CodeBenchGen: Creating Scalable Execution-based Code Generation
  Benchmarks

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

31 March 2024
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
ArXivPDFHTML

Papers citing "CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks"

10 / 10 papers shown
Title
Towards an Understanding of Context Utilization in Code Intelligence
Towards an Understanding of Context Utilization in Code Intelligence
Yanlin Wang
Kefeng Duan
Dewu Zheng
Ensheng Shi
F. Zhang
...
Xilin Liu
Yuchi Ma
Hongyu Zhang
Qianxiang Wang
Zibin Zheng
29
0
0
11 Apr 2025
ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation
Kaiyuan Liu
Youcheng Pan
J. Li
Daojing He
Yang Xiang
Yexing Du
Tianrun Gao
LLMAG
ELM
56
1
0
10 Mar 2025
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
Xin Zhou
M. Weyssow
Ratnadira Widyasari
Ting Zhang
Junda He
Yunbo Lyu
Jianming Chang
Beiqi Zhang
Dan Huang
David Lo
PILM
146
0
0
10 Feb 2025
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code
  Generation
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Qiming Zhu
Jialun Cao
Y. Lu
Hongyu Lin
Xianpei Han
Le Sun
S. Cheung
ALM
20
7
0
23 Aug 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
59
4
0
08 Jul 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
65
125
0
22 Jun 2024
Benchmarks and Metrics for Evaluations of Code Generation: A Critical
  Review
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
ALM
ELM
29
9
0
18 Jun 2024
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
178
780
0
02 May 2023
Aligning Offline Metrics and Human Judgments of Value for Code
  Generation Models
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models
Victor C. Dibia
Adam Fourney
Gagan Bansal
Forough Poursabzi-Sangdeh
Han Liu
Saleema Amershi
ALM
OffRL
36
12
0
29 Oct 2022
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
614
0
20 May 2021
1