ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07974
  4. Cited By
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
    ELM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

9 / 559 papers shown
HFT: Half Fine-Tuning for Large Language Models
HFT: Half Fine-Tuning for Large Language Models
Tingfeng Hui
Ying Tai
Shuohuan Wang
Weiran Xu
Yu Sun
Hua Wu
CLL
295
13
0
29 Apr 2024
Can Language Models Solve Olympiad Programming?
Can Language Models Solve Olympiad Programming?
Quan Shi
Michael Tang
Karthik Narasimhan
Shunyu Yao
ELMLRMReLM
334
51
0
16 Apr 2024
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Jiawei Guo
Ziming Li
Xueling Liu
Kaijing Ma
Tianyu Zheng
...
Xingwei Qu
Xiang Yue
Ge Zhang
Lei Ma
Jie Fu
KELM
341
22
0
04 Apr 2024
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval:
  Evolving Coding Benchmarks via LLM
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
Chun Xia
Yinlin Deng
Lingming Zhang
ALMELM
176
58
0
28 Mar 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
410
213
0
20 Feb 2024
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
Siwei Yang
Bingchen Zhao
Cihang Xie
LRM
188
7
0
14 Feb 2024
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mingzhe Du
Anh Tuan Luu
Bin Ji
Qian Liu
See-Kiong Ng
ALMELMOffRL
350
29
0
12 Feb 2024
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test GenerationWorkshop on Deep Learning for Testing and Testing for Deep Learning (LTTDL), 2023
Benjamin Steenhoek
Michele Tufano
Neel Sundaresan
Alexey Svyatkovskiy
OffRLALM
429
36
0
03 Oct 2023
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
WizardCoder: Empowering Code Large Language Models with Evol-InstructInternational Conference on Learning Representations (ICLR), 2023
Ziyang Luo
Can Xu
Lu Wang
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
ELMSyDaALM
723
859
0
14 Jun 2023
Previous
123...101112