ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07974
  4. Cited By
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
    ELM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 559 papers shown
OptiHive: Ensemble Selection for LLM-Based Optimization via Statistical Modeling
OptiHive: Ensemble Selection for LLM-Based Optimization via Statistical Modeling
Maxime Bouscary
Saurabh Amin
111
0
0
04 Aug 2025
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Zizhuo Zhang
Jianing Zhu
Xinmu Ge
Zihua Zhao
Zhanke Zhou
Xuan Li
Xiao Feng
Jiangchao Yao
Bo Han
ALMLRM
298
0
0
01 Aug 2025
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents
Jianqiang Xiao
Yuexuan Sun
Yixin Shao
Boxi Gan
Rongqiang Liu
Yanjing Wu
Weili Gua
Xiang Deng
278
0
0
01 Aug 2025
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Yihong Dong
Xue Jiang
Yongding Tao
Huanyu Liu
Kechi Zhang
...
Binhua Li
Zhi Jin
Fei Huang
Y. Li
Ge Li
LRM
369
17
0
31 Jul 2025
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Zunhai Su
Qingyuan Li
Hao Zhang
Weihao Ye
Qibo Xue
YuLei Qian
Yuchen Xie
Ngai Wong
Kehong Yuan
MoE
277
2
0
31 Jul 2025
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
Q. Guo
Wei Xie
Xiaofang Cai
Enze Wang
Shuoyoucheng Ma
Kai Chen
Xiaofeng Wang
Baosheng Wang
Xiaofeng Wang
Baosheng Wang
ELM
191
0
0
30 Jul 2025
IFEvalCode: Controlled Code Generation
IFEvalCode: Controlled Code Generation
J. Yang
Wei Emma Zhang
Shukai Liu
Linzheng Chai
Y. Tan
...
Wangchunshu Zhou
Guanglin Niu
Zhoujun Li
Binyuan Hui
Junyang Lin
ALM
234
3
0
30 Jul 2025
Kimi K2: Open Agentic Intelligence
Kimi K2: Open Agentic Intelligence
Kimi Team
Yifan Bai
Yiping Bao
Guanduo Chen
Jiahao Chen
...
Qifeng Teng
Chensi Wang
Dinglu Wang
Feng Wang
Haiming Wang
MoEVLMLRM
179
81
0
28 Jul 2025
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
Honghua Dong
Jiacheng Yang
Xun Deng
Yuhe Jiang
Gennady Pekhimenko
Fan Long
X. Si
208
2
0
28 Jul 2025
Diversity-Enhanced Reasoning for Subjective Questions
Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang
Zhiyuan Fan
Jiayu Liu
J. Huang
Yi R. Fung
LRM
488
6
0
27 Jul 2025
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback
Qiushi Sun
Jinyang Gong
Lei Li
Qipeng Guo
Fei Yuan
SyDa
151
2
0
25 Jul 2025
MemoCoder: Automated Function Synthesis using LLM-Supported Agents
MemoCoder: Automated Function Synthesis using LLM-Supported Agents
Yiping Jia
Zhen Ming Jiang
Shayan Noei
Ying Zou
LLMAGKELM
211
0
0
24 Jul 2025
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian
Jiapeng Wang
Qian Zhao
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Jiaxin Mao
Wayne Xin Zhao
Zhiqiang Zhang
Jun Zhou
MoMeCLL
251
6
0
23 Jul 2025
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
Zhuokun Chen
Zeren Chen
Jiahao He
Lu Sheng
Zhuliang Yu
Jianfei Cai
Bohan Zhuang
LRM
416
2
0
23 Jul 2025
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Changxin Tian
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
MoE
385
12
0
23 Jul 2025
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
Derek Li
Jiaming Zhou
Amirreza Kazemi
Qianyi Sun
Abbas Ghaddar
...
Liheng Ma
Yu-Juan Luo
Dong Li
Feng Wen
Jianye Hao
LRM
255
0
0
20 Jul 2025
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Ori Press
Brandon Amos
Haoyu Zhao
Yikai Wu
Samuel K. Ainsworth
...
K. Lieret
Hanlin Zhang
Shirley Huang
Matthias Bethge
Ofir Press
ALMELMLM&MA
281
4
0
19 Jul 2025
Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models
Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models
Jakub Walczak
Piotr Tomalak
Artur Laskowski
ELMLRM
76
0
0
18 Jul 2025
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
Diganta Misra
Nizar Islah
Victor May
Brice Rauby
Zihan Wang
...
Muawiz Chaudhary
Eilif B. Muller
Irina Rish
Samira Ebrahimi Kahou
Massimo Caccia
ELM
231
1
0
16 Jul 2025
Quantum Machine Learning in Multi-Qubit Phase-Space Part I: Foundations
Quantum Machine Learning in Multi-Qubit Phase-Space Part I: Foundations
Timothy Heightman
Edward Jiang
Ruth Mora-Soto
Maciej Lewenstein
Marcin Płodzień
315
4
0
16 Jul 2025
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Artem Chervyakov
Alexander Kharitonov
Pavel Zadorozhny
Adamenko Pavel
Rodion Levichev
...
Anton A. Emelyanov
Dmitrii Babaev
Vladimir Ivanov
Valentin Malykh
Alena Fenogenova
ELM
123
0
0
16 Jul 2025
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
Hongchao Jiang
Yiming Chen
Yushi Cao
Hung-yi Lee
R. Tan
ELMLRM
170
9
0
14 Jul 2025
VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains
VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains
Xuzhao Li
Xuchen Li
Shiyu Hu
Yongzhen Guo
Wentao Zhang
OffRLALMLRM
271
9
0
14 Jul 2025
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Yue Wang
Zheyong Xie
Ziyan Liu
...
Jun Fan
Xiaolong Jiang
Weiting Liu
Boyang Wang
Shaosheng Cao
ALM
219
0
0
13 Jul 2025
AICrypto: A Comprehensive Benchmark for Evaluating Cryptography Capabilities of Large Language Models
AICrypto: A Comprehensive Benchmark for Evaluating Cryptography Capabilities of Large Language Models
Yu Wang
Y. Liu
Liheng Ji
Han Luo
Wenjie Li
...
Geyuan Zhang
X. Li
Rongwu Xu
Yilei Chen
Tianxing He
ELM
372
2
0
13 Jul 2025
KAT-V1: Kwai-AutoThink Technical Report
KAT-V1: Kwai-AutoThink Technical Report
Zizheng Zhan
Ken Deng
Huaixi Tang
Wen Xiang
Kun Wu
...
J. Yang
Guang Chen
Haotian Zhang
Bin Chen
Bing Yu
OffRLALMLRM
339
7
0
11 Jul 2025
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
Jaedong Hwang
Kumar Tanmay
Seok-Jin Lee
Ayush Agrawal
Hamid Palangi
Kumar Ayush
Ila R Fiete
Paul Pu Liang
LRM
238
5
0
07 Jul 2025
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei
Liang Zhao
Jianjian Sun
Kangheng Lin
Jisheng Yin
...
Qi Han
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Vishal M. Patel
OffRLReLMLRMVLM
223
14
0
07 Jul 2025
Controlling Thinking Speed in Reasoning Models
Controlling Thinking Speed in Reasoning Models
Zhengkai Lin
Zhihang Fu
Ze Chen
Chao Chen
Liang Xie
Wenxiao Wang
Deng Cai
Zheng Wang
Jieping Ye
LRM
141
7
0
04 Jul 2025
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Zeyu Huang
Tianhao Cheng
Zihan Qiu
Zili Wang
Yinghui Xu
Edoardo M. Ponti
Ivan Titov
337
16
0
02 Jul 2025
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Maggie Huan
Yuetai Li
Tuney Zheng
Xiaoyu Xu
Seungone Kim
Minxin Du
Radha Poovendran
Graham Neubig
Xiang Yue
LRMELM
200
47
0
01 Jul 2025
Lost at the Beginning of Reasoning
Lost at the Beginning of Reasoning
Baohao Liao
Xinyi Chen
Sara Rajaee
Yuhui Xu
Christian Herold
Anders Søgaard
Maarten de Rijke
Christof Monz
LRM
211
5
0
27 Jun 2025
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
Xin Xu
Tianhao Chen
Fan Zhang
Wanlong Liu
Pengxiang Li
...
Hao Chen
Shiwei Liu
Boyao Wang
Can Yang
Lu Yin
LLMAGLRMKELM
289
1
0
26 Jun 2025
LastingBench: Defend Benchmarks Against Knowledge Leakage
LastingBench: Defend Benchmarks Against Knowledge Leakage
Yixiong Fang
Tianran Sun
Yuling Shi
Min Wang
Xiaodong Gu
KELM
276
5
0
21 Jun 2025
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Zhiyuan Liang
Dongwen Tang
Yuhao Zhou
Xuanlei Zhao
Mingjia Shi
...
Damian Borth
Michael M. Bronstein
Yang You
Zinan Lin
Kai Wang
OffRL
240
3
0
19 Jun 2025
OJBench: A Competition Level Code Benchmark For Large Language Models
OJBench: A Competition Level Code Benchmark For Large Language Models
Zhexu Wang
Y. Liu
Yejie Wang
Wenyang He
Bofei Gao
...
Kelin Fu
Flood Sung
Zhilin Yang
Tianyu Liu
Weiran Xu
ReLMLRMELM
231
3
0
19 Jun 2025
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
Haoyue Zhang
Hualei Zhang
Xiaosong Ma
Jie Zhang
Song Guo
LRM
272
1
0
19 Jun 2025
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ling Team
Bin Hu
Cai Chen
Deng Zhao
Ding Liu
...
Zhenglei Zhou
Zhenyu Huang
Zhiqiang Zhang
Zihao Wang
Zujie Wen
OffRLMoEALMLRM
260
7
0
17 Jun 2025
Optimizing Length Compression in Large Reasoning Models
Optimizing Length Compression in Large Reasoning Models
Zhengxiang Cheng
Dongping Chen
Mingyang Fu
Tianyi Zhou
OffRLMQLRM
289
20
0
17 Jun 2025
Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching
Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching
Qizheng Zhang
Michael Wornow
Kunle Olukotun
225
5
0
17 Jun 2025
Reasoning with Exploration: An Entropy Perspective
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
327
125
0
17 Jun 2025
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Xumeng Wen
Zihan Liu
Shun Zheng
Shengyu Ye
Shengyu Ye
...
Yang Wang
Junjie Li
Ziming Miao
Jiang Bian
Mao Yang
LRM
433
59
0
17 Jun 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Kaiyuan Chen
Y. Ren
Yang Liu
Xiaobo Hu
Haotong Tian
...
Yuan Jiang
Zexuan Liu
Zihan Yin
Zijian Ma
Zhiwen Mo
354
29
0
16 Jun 2025
FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
Hongda Zhu
Y. Zhang
Bing Zhao
Jingzhe Ding
Siyao Liu
Tong Liu
Dandan Wang
Yanan Liu
Zhaojian Li
201
3
0
16 Jun 2025
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
Zihan Liu
Zhuolin Yang
Yang Chen
Chankyu Lee
Mohammad Shoeybi
Bryan Catanzaro
Wei Ping
OffRLReLMLRM
178
37
0
16 Jun 2025
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?
Xiangyang Li
Xiaopeng Li
Kuicai Dong
Quanhu Zhang
Rongju Ruan
Xinyi Dai
Xiaoshuang Liu
Shengchun Xu
Yasheng Wang
Ruiming Tang
ELMLRM
152
1
0
15 Jun 2025
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhenyu Hou
Ziniu Hu
Yujiang Li
Rui Lu
Jie Tang
Yuxiao Dong
OffRLLRM
193
22
0
13 Jun 2025
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu
Hamish Ivison
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
258
2
0
13 Jun 2025
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and ExposureAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zheyuan Yang
Zexi Kuang
Xue Xia
Yilun Zhao
ELM
202
4
0
13 Jun 2025
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Yaoming Zhu
Junxin Wang
Yiyang Li
Lin Qiu
Zongyu Wang
...
Xuezhi Cao
Yuhuai Wei
Mingshi Wang
Xunliang Cai
Rong Ma
LRM
334
3
0
12 Jun 2025
Previous
123456...101112
Next