ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.08227
  4. Cited By
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural
  Code Generation

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

17 August 2022
Federico Cassano
John Gouwar
Daniel Nguyen
S. Nguyen
Luna Phipps-Costin
Donald Pinckney
Ming-Ho Yee
Yangtian Zi
Carolyn Jane Anderson
Molly Q. Feldman
Arjun Guha
Michael Greenberg
Abhinav Jangda
    ELM
ArXivPDFHTML

Papers citing "MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation"

50 / 68 papers shown
Title
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
Yunhui Xia
Wei Shen
Yan Wang
Jason Klein Liu
Huifeng Sun
Siyue Wu
Jian Hu
Xiaolong Xu
AI4TS
23
1
0
20 Apr 2025
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Nikita Sorokin
I. Sedykh
Valentin Malykh
17
0
0
13 Apr 2025
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
LRM
62
0
0
10 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
97
2
0
07 Mar 2025
Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Alireza Daghighfarsoodeh
Chung-Yu Wang
Hamed Taherkhani
Melika Sepidband
Mohammad Abdollahi
Hadi Hemmati
Hung Viet Pham
ALM
ELM
93
0
0
26 Feb 2025
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
Ruizhong Qiu
Weiliang Will Zeng
Hanghang Tong
James Ezick
Christopher Lott
82
15
0
20 Feb 2025
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
C. Zhou
Xinyu Zhang
Dandan Song
Xiancai Chen
Wanli Gu
Huipeng Ma
Yuhang Tian
M. Zhang
Linmei Hu
63
1
0
13 Feb 2025
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
Xin Zhou
M. Weyssow
Ratnadira Widyasari
Ting Zhang
Junda He
Yunbo Lyu
Jianming Chang
Beiqi Zhang
Dan Huang
David Lo
PILM
134
0
0
10 Feb 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
93
128
0
22 Jan 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
S. Cheung
ALM
69
1
0
18 Jan 2025
Code LLMs: A Taxonomy-based Survey
Code LLMs: A Taxonomy-based Survey
Nishat Raihan
Christian D. Newman
Marcos Zampieri
91
1
0
11 Dec 2024
A Preliminary Study of Multilingual Code Language Models for Code
  Generation Task Using Translated Benchmarks
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks
Rohit Dandamudi
Gema Rodríguez-Pérez
ELM
69
0
0
23 Nov 2024
M2rc-Eval: Massively Multilingual Repository-level Code Completion
  Evaluation
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
J. Liu
Ken Deng
Congnan Liu
Jian Yang
Shukai Liu
...
Zekun Wang
Guoan Zhang
Bangyu Xiang
Wenbo Su
Bo Zheng
58
4
0
28 Oct 2024
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in
  Low-Resource Code
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code
Jipeng Zhang
Jianshu Zhang
Yuanzhe Li
Renjie Pi
Rui Pan
Runtao Liu
Ziqiang Zheng
Tong Zhang
26
0
0
24 Oct 2024
Do Current Language Models Support Code Intelligence for R Programming Language?
Do Current Language Models Support Code Intelligence for R Programming Language?
Zixiao Zhao
Fatemeh H. Fard
ELM
31
0
0
10 Oct 2024
Rule-based Data Selection for Large Language Models
Rule-based Data Selection for Large Language Models
Xiaomin Li
Mingye Gao
Zhiwei Zhang
Chang Yue
Hong Hu
17
4
0
07 Oct 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
  Domains?
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
24
17
0
04 Oct 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
31
3
0
26 Sep 2024
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
Mingjie Liu
Yun-Da Tsai
Wenfei Zhou
Haoxing Ren
SyDa
3DV
41
3
0
19 Sep 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
65
195
0
18 Sep 2024
CORE-Bench: Fostering the Credibility of Published Research Through a
  Computational Reproducibility Agent Benchmark
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Zachary S. Siegel
Sayash Kapoor
Nitya Nagdir
Benedikt Stroebl
Arvind Narayanan
14
8
0
17 Sep 2024
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research
  Repositories
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Ben Bogin
Kejuan Yang
Shashank Gupta
Kyle Richardson
Erin Bransom
Peter Clark
Ashish Sabharwal
Tushar Khot
ELM
LRM
34
9
0
11 Sep 2024
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training
  Data
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data
Hossein Hajipour
Lea Schönherr
Thorsten Holz
Mario Fritz
AAML
SyDa
21
0
0
10 Sep 2024
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
  at Scale
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Huy N. Phan
Phong X. Nguyen
Nghi D. Q. Bui
LLMAG
20
10
0
09 Sep 2024
Multi-Programming Language Ensemble for Code Generation in Large
  Language Model
Multi-Programming Language Ensemble for Code Generation in Large Language Model
Tengfei Xue
Xuefeng Li
Tahir Azim
Roman Smirnov
Jianhui Yu
Arash Sadrieh
Babak Pahlavan
13
2
0
06 Sep 2024
CodeJudge-Eval: Can Large Language Models be Good Judges in Code
  Understanding?
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?
Yuwei Zhao
Ziyang Luo
Yuchen Tian
Hongzhan Lin
Weixiang Yan
Annan Li
Jing Ma
ELM
ALM
LRM
29
8
0
20 Aug 2024
Practical Attacks against Black-box Code Completion Engines
Practical Attacks against Black-box Code Completion Engines
Slobodan Jenko
Jingxuan He
Niels Mündler
Mark Vero
Martin Vechev
ELM
AAML
SILM
19
3
0
05 Aug 2024
ArchCode: Incorporating Software Requirements in Code Generation with
  Large Language Models
ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models
Hojae Han
Jaejin Kim
Jaeseok Yoo
Youngwon Lee
Seung-won Hwang
19
0
0
02 Aug 2024
On Leakage of Code Generation Evaluation Datasets
On Leakage of Code Generation Evaluation Datasets
Alexandre Matton
Tom Sherborne
Dennis Aumiller
Elena Tommasone
Milad Alizadeh
Jingyi He
Raymond Ma
Maxime Voisin
Ellen Gilsenan-McMahon
Matthias Gallé
14
5
0
10 Jul 2024
Narrow Transformer: Starcoder-Based Java-LM For Desktop
Narrow Transformer: Starcoder-Based Java-LM For Desktop
Kamalkumar Rathinasamy
Balaji A J
Ankush Kumar
Gagan Gayari
Harshini K
Rajab Ali Mondal
S. SreenivasaRaghavanK
Swayam Singh
33
1
0
04 Jul 2024
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Mehant Kammakomati
Sameer Pimparkhede
Srikanth G. Tamilselvam
Prince Kumar
Pushpak Bhattacharyya
ALM
29
0
0
03 Jul 2024
UniCoder: Scaling Code Large Language Model via Universal Code
UniCoder: Scaling Code Large Language Model via Universal Code
Tao Sun
Linzheng Chai
Jian Yang
Yuwei Yin
Hongcheng Guo
Jiaheng Liu
Bing Wang
Liqun Yang
Zhoujun Li
OffRL
LRM
57
16
0
24 Jun 2024
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative
  Models
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models
Sanjay Vishwakarma
Francis Harkins
Siddharth Golecha
Vishal Sharathchandra Bajpe
Nicolas Dupuis
Luca Buratti
David Kremer
Ismael Faro
Ruchir Puri
Juan Cruz-Benito
ELM
22
3
0
20 Jun 2024
Benchmarks and Metrics for Evaluations of Code Generation: A Critical
  Review
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
ALM
ELM
29
9
0
18 Jun 2024
ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
19
2
0
18 Jun 2024
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with
  Ko-H5 Benchmark
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Chanjun Park
Hyeonwoo Kim
Dahyun Kim
Seonghwan Cho
Sanghoon Kim
Sukyung Lee
Yungi Kim
Hwalsuk Lee
ELM
ALM
17
14
0
31 May 2024
AutoCoder: Enhancing Code Large Language Model with
  \textsc{AIEV-Instruct}
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}
Bin Lei
Yuchen Li
Qiuwu Chen
SyDa
ALM
ELM
28
6
0
23 May 2024
On the Limitations of Embedding Based Methods for Measuring Functional
  Correctness for Code Generation
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
30
1
0
26 Apr 2024
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging
  Upcycled Mixture-of-Experts
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Yifeng Ding
Jiawei Liu
Yuxiang Wei
Terry Yue Zhuo
Lingming Zhang
ALM
MoE
27
3
0
23 Apr 2024
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Jiawei Guo
Ziming Li
Xueling Liu
Kaijing Ma
Tianyu Zheng
...
Xingwei Qu
Xiang Yue
Ge Zhang
Wenhu Chen
Jie Fu
KELM
46
12
0
04 Apr 2024
SDSAT: Accelerating LLM Inference through Speculative Decoding with
  Semantic Adaptive Tokens
SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens
Chengbo Liu
Yong Zhu
23
0
0
27 Mar 2024
Exploring Language Model's Code Generation Ability with Auxiliary
  Functions
Exploring Language Model's Code Generation Ability with Auxiliary Functions
Seonghyeon Lee
Sanghwan Jang
Seongbo Jang
Dongha Lee
Hwanjo Yu
ALM
17
1
0
15 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
24
260
0
12 Mar 2024
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Linyuan Gong
Sida Wang
Mostafa Elhoushi
Alvin Cheung
27
15
0
07 Mar 2024
IRCoder: Intermediate Representations Make Language Models Robust
  Multilingual Code Generators
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
Indraneil Paul
Goran Glavas
Iryna Gurevych
22
12
0
06 Mar 2024
CodeMind: A Framework to Challenge Large Language Models for Code
  Reasoning
CodeMind: A Framework to Challenge Large Language Models for Code Reasoning
Changshu Liu
Shizhuo Dylan Zhang
Ali Reza Ibrahimzada
Reyhaneh Jabbarvand
ELM
ReCod
LRM
28
0
0
15 Feb 2024
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
Dong Huang
Yuhao Qing
Weiyi Shang
Heming Cui
Jie M. Zhang
74
10
0
03 Feb 2024
Visualization Generation with Large Language Models: An Evaluation
Visualization Generation with Large Language Models: An Evaluation
Guozheng Li
Xinyu Wang
Gerile Aodeng
Shunyuan Zheng
Yu Zhang
Chuangxin Ou
Song Wang
Chi Harold Liu
6
27
0
20 Jan 2024
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
14
61
0
19 Jan 2024
12
Next