Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2404.00566
Cited By

CodeBenchGen: Creating Scalable Execution-based Code Generation
Benchmarks

v1v2v3 (latest)

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

31 March 2024

Divyanshu Sheth

Daniel Fried

ArXiv (abs)PDF HTML Github (8★)

Papers citing "CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks"

12 / 12 papers shown

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

...

216

0

0

26 Aug 2025

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

141

0

0

26 Jul 2025

CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

Monoshi Kumar Roy

Benjamin Steenhoek

242

4

0

31 May 2025

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

Vijay Kethanaboyina

269

10

0

29 May 2025

Large Language Models for IT Automation Tasks: Are We There Yet?

Large Language Models for IT Automation Tasks: Are We There Yet?

Md Mahadi Hassan

189

1

0

26 May 2025

Towards an Understanding of Context Utilization in Code Intelligence

Towards an Understanding of Context Utilization in Code Intelligence

...

256

3

0

11 Apr 2025

ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation

ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

304

9

0

10 Mar 2025

LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks

LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks

Ratnadira Widyasari

994

26

0

10 Feb 2025

DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code
Generation

DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code GenerationAAAI Conference on Artificial Intelligence (AAAI), 2024

Xianpei Han

Le Sun

Shing-Chi Cheung

145

18

0

23 Aug 2024

CodeUpdateArena: Benchmarking Knowledge Editing on API Updates

CodeUpdateArena: Benchmarking Knowledge Editing on API Updates

395

13

0

08 Jul 2024

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Wenhao Yu

...

David Lo

Xiaoning Du

Leandro von Werra

603

371

0

22 Jun 2024

Benchmarks and Metrics for Evaluations of Code Generation: A Critical
Review

Benchmarks and Metrics for Evaluations of Code Generation: A Critical ReviewInternational Conference on Artificial Intelligence Testing (ICAIT), 2024

Debalina Ghosh Paul

186

31

0

18 Jun 2024