ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07974
  4. Cited By
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
    ELM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 559 papers shown
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
458
8
0
12 Jun 2025
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Y. Jiang
Yuwen Xiong
Yufeng Yuan
Chao Xin
Wenyuan Xu
Yu Yue
Qianchuan Zhao
Lin Yan
LRM
302
10
0
12 Jun 2025
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Yaoming Zhu
Junxin Wang
Yiyang Li
Lin Qiu
Zongyu Wang
...
Xuezhi Cao
Yuhuai Wei
Mingshi Wang
Xunliang Cai
Rong Ma
LRM
334
3
0
12 Jun 2025
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan
Tengyang Xie
LRM
317
16
0
10 Jun 2025
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner
Lei Zhang
Jiyan Yang
Min Yang
Zhiqiang Wang
Mouxiang Chen
Jiajun Zhang
Zeyu Cui
Binyuan Hui
Junyang Lin
294
6
0
10 Jun 2025
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
Yuki Imajuku
Kohki Horie
Yoichi Iwata
Kensho Aoki
Naohiro Takahashi
Takuya Akiba
237
7
0
10 Jun 2025
Reinforcement Learning Teachers of Test Time Scaling
Edoardo Cetin
Tianyu Zhao
Yujin Tang
OffRLReLMLRM
401
3
0
10 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
311
21
0
09 Jun 2025
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems
Yuhan Cao
Z. Chen
Kun Quan
Ziliang Zhang
Yu Wang
...
Can Zheng
Shouchen Zhou
Yuxiang Zhu
Yiming Huang
Tian Xie
ELMLRM
255
3
0
07 Jun 2025
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zichen Tang
Haihong E
Ziyan Ma
Haoyang He
Jiacheng Liu
...
Kun Ji
Qing Huang
Xinyang Hu
Wenshu Fan
Qianhe Zheng
AIMatAIFinELM
371
8
0
06 Jun 2025
CodeContests+: High-Quality Test Case Generation for Competitive Programming
CodeContests+: High-Quality Test Case Generation for Competitive Programming
Zihan Wang
Siyao Liu
Yang Sun
Hongyan Li
Kai Shen
LRM
179
18
0
06 Jun 2025
dots.llm1 Technical Report
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
198
3
0
06 Jun 2025
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Jiachen Zhu
Menghui Zhu
Renting Rui
Rong Shan
Congmin Zheng
...
Jianghao Lin
Weiwen Liu
Ruiming Tang
Yong Yu
Weinan Zhang
LLMAGELM
295
6
0
06 Jun 2025
SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code
SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code
Xinghang Li
Jingzhe Ding
Chao Peng
Bing Zhao
Xiang Gao
Hongwan Gao
Xinchen Gu
ELM
300
6
0
06 Jun 2025
DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation
DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation
Jingyu Xiao
Ming Wang
Man Ho Lam
Yuxuan Wan
Junliang Liu
Yintong Huo
Michael R. Lyu
143
17
0
06 Jun 2025
hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
hdl2v: A Code Translation Dataset for Enhanced LLM Verilog GenerationWorkshop on Machine Learning for CAD (ML4CAD), 2025
Charles Hong
Brendan Roberts
Huijae An
Alex Um
Advay Ratan
Y. Shao
399
3
0
05 Jun 2025
Inference-Time Hyper-Scaling with KV Cache Compression
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki
Konrad Staniszewski
Piotr Nawrot
Edoardo Ponti
277
13
0
05 Jun 2025
Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning
Ho-Lam Chung
Teng-Yun Hsiao
Hsiao-Ying Huang
Chunerh Cho
Jian-Ren Lin
Zhang Ziwei
Yun-Nung Chen
LRM
353
4
0
05 Jun 2025
Quantifying Cross-Modality Memorization in Vision-Language Models
Yuxin Wen
Yangsibo Huang
Tom Goldstein
Ravi Kumar
Badih Ghazi
Chiyuan Zhang
332
2
0
05 Jun 2025
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Boya Xiong
Shuo Wang
Weifeng Ge
Guanhua Chen
Yun-Nung Chen
MQ
232
0
0
05 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
457
6
0
05 Jun 2025
Seed-Coder: Let the Code Model Curate Data for Itself
Seed-Coder: Let the Code Model Curate Data for Itself
ByteDance Seed
Yuyu Zhang
Jing Su
Yifan Sun
Chenguang Xi
...
Jiaze Chen
Siyao Liu
Kai Shen
Liang Xiang
Yonghui Wu
SyDaLRM
342
22
0
04 Jun 2025
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Yinjie Wang
Ling Yang
Ye Tian
Ke Shen
Mengdi Wang
LRM
357
21
0
03 Jun 2025
AI Scientists Fail Without Strong Implementation Capability
AI Scientists Fail Without Strong Implementation Capability
Minjun Zhu
Qiujie Xie
Yixuan Weng
Jian Wu
Zhen Lin
Linyi Yang
Yue Zhang
ELM
347
8
0
02 Jun 2025
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
Yiran Zhang
Mo Wang
Xiaoyang Li
Kaixuan Ren
Chencheng Zhu
Usman Naseem
LRM
445
5
0
02 Jun 2025
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
Desen Meng
Rui Huang
Zhilin Dai
Xinhao Li
Yifan Xu
...
Z. Huang
Meng Zhang
L. Zhang
Lu Dong
Limin Wang
OffRLVLMLRM
271
12
0
02 Jun 2025
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
S. Wang
Le Yu
Chang Gao
Chujie Zheng
Shixuan Liu
...
Yang Yue
Qing Xiao
Bowen Yu
Gao Huang
Junyang Lin
LRM
345
228
0
02 Jun 2025
How Programming Concepts and Neurons Are Shared in Code Language Models
How Programming Concepts and Neurons Are Shared in Code Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Amir Hossein Kargaran
Yihong Liu
François Yvon
Hinrich Schütze
196
3
0
01 Jun 2025
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
Monoshi Kumar Roy
Simin Chen
Benjamin Steenhoek
Jinjun Peng
Gail E. Kaiser
Baishakhi Ray
Wei Le
LRM
248
4
0
31 May 2025
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu
Jiaxuan Gao
Xujie Shen
Chen Zhu
Zhiyu Mei
...
Jun Mei
Jiashu Wang
Tongkai Yang
Binhang Yuan
Yi Wu
OffRLSyDaLRM
512
89
0
30 May 2025
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Zafir Stojanovski
Oliver Stanley
Joe Sharratt
Richard Jones
Abdulhakeem Adefioye
Jean Kaddour
Andreas Kopf
OffRLLRM
378
38
0
30 May 2025
RAST: Reasoning Activation in LLMs via Small-model Transfer
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang
Xinyu Zhu
Zilin Xiao
Minhao Jiang
Yu Meng
Jiawei Han
OffRLReLMLRM
256
1
0
30 May 2025
HardTests: Synthesizing High-Quality Test Cases for LLM Coding
HardTests: Synthesizing High-Quality Test Cases for LLM Coding
Zhongmou He
Yee Man Choi
Kexun Zhang
Jiabao Ji
Junting Zhou
Dejia Xu
Ivan Bercovich
Aidan Zhang
Lei Li
321
7
0
30 May 2025
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du
Luu Tuan Tuan
Yue Liu
Yuhao Qing
Dong Huang
Xinyi He
Qian Liu
Zejun Ma
See-Kiong Ng
321
6
0
29 May 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Y. Liu
Kun Ouyang
Haoning Wu
Yi Liu
Lin Sui
Xinhao Li
Y. Zhong
Y. Charles
Xinyu Zhou
Xu Sun
VLMLRM
275
4
0
29 May 2025
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Guangtao Zeng
Maohao Shen
Delin Chen
Zhenting Qi
Subhro Das
...
David D. Cox
G. Wornell
Wei Lu
Zhang-Wei Hong
Chuang Gan
278
5
0
29 May 2025
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
Wenjing Xing
Wenke Lu
Yeheng Duan
Bing Zhao
Zhenghui kang
Yaolong Wang
Kai Gao
Lei Qiao
SyDa
184
0
0
29 May 2025
Can LLMs Reason Structurally? An Evaluation via the Lens of Data Structures
Can LLMs Reason Structurally? An Evaluation via the Lens of Data Structures
Yu He
Yingxi Li
Colin White
Ellen Vitercik
ELMLRM
226
1
0
29 May 2025
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
Manish Shetty
Naman Jain
Jinjian Liu
Vijay Kethanaboyina
Koushik Sen
Ion Stoica
ELM
270
10
0
29 May 2025
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Atharva Naik
Darsh Agrawal
Darsh Agrawal
Yash Mathur
Manav Kapadnis
Yuwei An
Clayton Marr
Carolyn Rose
David R. Mortensen
LRMELM
248
0
0
29 May 2025
VERINA: Benchmarking Verifiable Code Generation
VERINA: Benchmarking Verifiable Code Generation
Zhe Ye
Zhengxu Yan
Jingxuan He
Timothe Kasriel
Kaiyu Yang
Dawn Song
220
7
0
29 May 2025
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
Gangwei Jiang
Yahui Liu
Zhaoyi Li
Qi Wang
Fuzheng Zhang
Linqi Song
Ying Wei
Defu Lian
LRM
199
8
0
28 May 2025
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
287
0
0
28 May 2025
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Hanting Chen
Yasheng Wang
Kai Han
Dong Li
Lin Li
...
Hailin Hu
Yehui Tang
Dacheng Tao
Xinghao Chen
Yunhe Wang
LRM
230
18
0
28 May 2025
Scaling Reasoning without Attention
Scaling Reasoning without Attention
Xueliang Zhao
Wei Wu
Lingpeng Kong
OffRLReLMLRMVLM
177
3
0
28 May 2025
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam
Chang Jo Kim
Jiefeng Chen
Jinwoo Shin
Sercan Ö. Arık
Tomas Pfister
LLMAG
302
6
0
27 May 2025
Code Researcher: Deep Research Agent for Large Systems Code and Commit History
Code Researcher: Deep Research Agent for Large Systems Code and Commit History
Ramneet Singh
Sathvik Joel
Abhav Mehrotra
Nalin Wadhwa
Ramakrishna Bairi
Aditya Kanade
Nagarajan Natarajan
LLMAG
169
9
0
27 May 2025
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni
Zhengyuan Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
W. Zuo
Lijuan Wang
ReLMLRM
300
12
0
26 May 2025
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Junnan Liu
Hongwei Liu
Linchen Xiao
Shudong Liu
Taolin Zhang
Zihan Ma
Songyang Zhang
Kai Chen
LRM
354
3
0
26 May 2025
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu
Yuanxiang Fan
Z. L. Jiang
Han Ding
Yongyi Hu
...
Yunan Huang
Mozhi Zhang
Pengyu Zhao
Junjie Yan
Junxian He
OffRLNAISyDaLRMELM
330
21
0
26 May 2025
Previous
123...567...101112
Next