ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07974
  4. Cited By
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
    ELM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 560 papers shown
Merge-of-Thought Distillation
Merge-of-Thought Distillation
Zhanming Shen
Zeyu Qin
Zenan Huang
Hao Chen
J. Hu
Yihong Zhuang
Guoshan Lu
Gang Chen
Junbo Zhao
MoMeLRM
339
3
0
10 Sep 2025
K2-Think: A Parameter-Efficient Reasoning System
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng
Richard Fan
Shibo Hao
Taylor W. Killian
Haonan Li
...
Xuezhe Ma
Guowei He
Zhiting Hu
Zhengzhong Liu
Eric P. Xing
ReLMOffRLALMLRM
307
5
0
09 Sep 2025
Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference
Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference
Xiyu Guo
Shan Wang
Chunfang Ji
Xuefeng Zhao
Wenhao Xi
Y. Liu
Qinglan Li
Chao Deng
Junlan Feng
242
2
0
09 Sep 2025
SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs
SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs
Xinyu Zhang
Changzhi Zhou
Linmei Hu
L. Zhang
Xiancai Chen
Haomin Fu
Yang Yang
M. Zhang
SyDa
145
0
0
09 Sep 2025
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet
James Xu Zhao
Bryan Hooi
See-Kiong Ng
LRM
102
5
0
08 Sep 2025
Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing
Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing
Yuanteng Chen
Peisong Wang
Yuantian Shao
Nanxin Zeng
Chang Xu
Jian Cheng
MoE
185
0
0
08 Sep 2025
Set Block Decoding is a Language Model Inference Accelerator
Set Block Decoding is a Language Model Inference Accelerator
Itai Gat
Heli Ben-Hamu
Marton Havasi
Daniel Haziza
Jeremy Reizenstein
Gabriel Synnaeve
David Lopez-Paz
Brian Karrer
Y. Lipman
162
7
0
04 Sep 2025
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models
Jingjing Liu
Zeming Liu
Zihao Cheng
Mengliang He
Xiaoming Shi
Yuhang Guo
Xiangrong Zhu
Yuanfang Guo
Yunhong Wang
Haifeng Wang
175
2
0
04 Sep 2025
Implicit Reasoning in Large Language Models: A Comprehensive Survey
Implicit Reasoning in Large Language Models: A Comprehensive Survey
Jindong Li
Yali Fu
Li Fan
Jiahong Liu
Yao Shu
Chengwei Qin
Menglin Yang
Irwin King
Rex Ying
OffRLLRMAI4CE
234
14
0
02 Sep 2025
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Dongfu Jiang
Yi Lu
Zhuofeng Li
Zhiheng Lyu
Ping Nie
...
Hui Chen
Kai Zou
Chao Du
Tianyu Pang
Wenhu Chen
243
25
0
01 Sep 2025
LongCat-Flash Technical Report
LongCat-Flash Technical Report
M-A-P Team
Bayan
Bei Li
Bingye Lei
Bo Wang
...
Rongxiang Weng
Ruichen Shao
Rumei Li
Shizhe Wu
Shuai Liang
MLLMMoEVLM
425
16
0
01 Sep 2025
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Xinyu Tang
Zhenduo Zhang
Y. Liu
Wayne Xin Zhao
Zujie Wen
Zhiqiang Zhang
Jun Zhou
OffRL
110
3
0
01 Sep 2025
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
Jay Vaghasiya
Omkar Ghugarkar
Vishvesh Bhat
Vipul Dholaria
Julian McAuley
LLMAGReLMLRM
221
1
0
31 Aug 2025
Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems?
Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems?
Md Tanzib Hosain
Md Kishor Morol
ReLMLRM
117
3
0
30 Aug 2025
A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services
A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM ServicesInternational Symposium on Mixed and Augmented Reality (ISMAR), 2025
Guanzhong Pan
Vishal Chodnekar
Abinas Roy
Haibo Wang
ELM
283
3
0
30 Aug 2025
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
Haoze Wu
Cheng Wang
Wenshuo Zhao
Junxian He
OffRL
131
4
0
28 Aug 2025
AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
AR2^22: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
Cheng-Kai Yeh
Hsing-Wang Lee
Chung-Hung Kuo
Hen-Hsen Huang
LRM
58
0
0
27 Aug 2025
Alignment with Fill-In-the-Middle for Enhancing Code Generation
Alignment with Fill-In-the-Middle for Enhancing Code Generation
Houxing Ren
Zimu Lu
Weikang Shi
Haotian Hou
Yunqiao Yang
Ke Wang
A-Long Zhou
Junting Pan
Mingjie Zhan
Jiaming Song
108
1
0
27 Aug 2025
LongReasonArena: A Long Reasoning Benchmark for Large Language Models
LongReasonArena: A Long Reasoning Benchmark for Large Language Models
Jiayu Ding
Shuming Ma
Lei Cui
Nanning Zheng
Furu Wei
LRMELM
114
0
0
26 Aug 2025
Beyond Memorization: Reasoning-Driven Synthesis as a Mitigation Strategy Against Benchmark Contamination
Beyond Memorization: Reasoning-Driven Synthesis as a Mitigation Strategy Against Benchmark Contamination
Terry Jingchen Zhang
Gopal Dev
Ning Wang
Nicole Ni
Wenyuan Jiang
Mubashara Akhtar
Bernhard Schölkopf
Mrinmaya Sachan
Zhijing Jin
214
1
0
26 Aug 2025
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
Ziyi Ni
Huacan Wang
Shuo Zhang
Shuo Lu
Ziyang He
...
Xin Li
Chen-Hao Hu
Binxing Jiao
Daxin Jiang
Pin Lyu
268
4
0
26 Aug 2025
DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models
DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models
K. Yan
Xuanqing Shi
Hongcheng Guo
Wenxuan Wang
Zhuosheng Zhang
Chengwei Qin
LRM
175
0
0
25 Aug 2025
Hermes 4 Technical Report
Hermes 4 Technical Report
Ryan Teknium
Roger Jin
Jai Suphavadeeprasit
Dakota Mahan
Jeffrey Quesnelle
Joe Li
Chen Guang
Shannon Sands
Karan Malhotra
129
1
0
25 Aug 2025
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
Maojia Song
Tej Deep Pala
Weisheng Jin
Amir Zadeh
Chuan Li
Dorien Herremans
Soujanya Poria
Soujanya Poria
LLMAG
173
3
0
24 Aug 2025
AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions
AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions
Zihan Wang
Jiaze Chen
Zhicheng Liu
Markus Mak
Yidi Du
...
Y. Wu
Daoguang Zan
Y. Fu
Mingxuan Wang
Ming Ding
ELM
98
3
0
22 Aug 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Nvidia
Aarti Basant
Abhijit Khairnar
Abhijit Paithankar
Abhinav Khattar
...
Keith Wyss
Keshav Santhanam
Kezhi Kong
Krzysztof Pawelec
Kumar Anik
LRM
298
0
0
20 Aug 2025
G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
G2^22RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
Yongxin Guo
Wenbo Deng
Zhenglin Cheng
Xiaoying Tang
LRM
148
3
0
18 Aug 2025
Reinforcement Learning with Rubric Anchors
Reinforcement Learning with Rubric Anchors
Zenan Huang
Yihong Zhuang
Guoshan Lu
Zeyu Qin
Haokai Xu
...
Yanmei Gu
Y Samuel Wang
Zhengkai Yang
Jianguo Li
Junbo Zhao
ALM
130
22
0
18 Aug 2025
Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
Yiqun Zhang
Hao Li
Jianhao Chen
Hangfan Zhang
Peng Ye
Wenlong Zhang
Shuyue Hu
190
10
0
18 Aug 2025
Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis
Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis
Ayoub Ben Chaliah
Hela Dellagi
OffRLLRM
104
0
0
18 Aug 2025
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
Yutong Bian
Xianhao Lin
Yupeng Xie
Tianyang Liu
Mingchen Zhuge
...
Jiaqi Chen
Xiangru Tang
Yongxin Ni
Sirui Hong
Chenglin Wu
126
1
0
17 Aug 2025
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Kangyu Wang
Hongliang He
Lin Liu
Ruiqi Liang
Zhenzhong Lan
Jianguo Li
ALMELM
158
0
0
15 Aug 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang
Yuexiang Xie
Yuchang Sun
Yanxi Chen
Guoyin Wang
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
210
33
0
15 Aug 2025
Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning
Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning
Lorenzo Jaime Yu Flores
Junyi Shen
Xiaoyuan Gu
144
0
0
14 Aug 2025
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
Xiaojun Wu
Xiaoguang Jiang
Xue Yang
Jucai Zhai
Dengfeng Liu
...
Ninglun Gu
Jin Yang
Kailai Zhang
Yelun Bao
Jun Wang
LRM
141
7
0
13 Aug 2025
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
Niels Mündler
Jasper Dekoninck
Martin Vechev
115
2
0
13 Aug 2025
User-centric Subjective Leaderboard by Customizable Reward Modeling
User-centric Subjective Leaderboard by Customizable Reward Modeling
Qi Jia
Xiujie Song
Zicheng Zhang
Yijin Guo
Kaiwei Zhang
Z. Chen
Guangtao Zhai
ALM
147
1
0
13 Aug 2025
IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization
IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization
Yuzhuo Bai
Shitong Duan
Muhua Huang
Jing Yao
Zhenghao Liu
Peng Zhang
Tun Lu
Xiaoyuan Yi
Maosong Sun
Xing Xie
194
1
0
12 Aug 2025
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
Peiji Li
Jiasheng Ye
Yongkang Chen
Yichuan Ma
Zijie Yu
...
Linyang Li
Qipeng Guo
Dahua Lin
Bowen Zhou
Kai Chen
LLMAGALMLRM
131
11
0
12 Aug 2025
Retrospective Sparse Attention for Efficient Long-Context Generation
Retrospective Sparse Attention for Efficient Long-Context Generation
Seonghwan Choi
Beomseok Kang
Dongwon Jo
Jae-Joon Kim
90
2
0
12 Aug 2025
PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
Pedro Orvalho
Marta Kwiatkowska
ALM
78
1
0
11 Aug 2025
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Haoyuan Wu
Haoxing Chen
Xiaodong Chen
Zhanchao Zhou
Tieyuan Chen
...
Junbo Zhao
Lin Liu
Zhenzhong Lan
Bei Yu
Jianguo Li
MoE
137
4
0
11 Aug 2025
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
Jia-Yi Fu
Xinyu Yang
Hongzhi Zhang
Yahui Liu
Jingyuan Zhang
Qi Wang
Fuzheng Zhang
Guorui Zhou
ELM
257
2
0
07 Aug 2025
Posterior-GRPO: Rewarding Reasoning Processes in Code Generation
Posterior-GRPO: Rewarding Reasoning Processes in Code Generation
Lishui Fan
Yu Zhang
Mouxiang Chen
Zhongxin Liu
OffRLLRM
165
13
0
07 Aug 2025
InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
Shuo Cai
Su Lu
Qi Zhou
Kejing Yang
Zhijie Sang
C. Xie
Hongxia Yang
ReLMLRM
182
1
0
07 Aug 2025
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang
Haihong E
Jiacheng Liu
Zhongjun Yang
Rongjin Li
...
Yiling Huang
Xinyi Hu
Qing Huang
Zijian Xie
Shiyao Peng
159
2
0
06 Aug 2025
Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment
Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment
Aleksander Boruch-Gruszecki
Yangtian Zi
Zixuan Wu
Tejas Oberoi
Carolyn Jane Anderson
Joydeep Biswas
Arjun Guha
SyDaOffRL
146
2
0
06 Aug 2025
CTTS: Collective Test-Time Scaling
CTTS: Collective Test-Time Scaling
Zhende Song
Shengji Tang
Peng Ye
Jiayuan Fan
Tao Chen
Tao Chen
Wanli Ouyang
LRM
197
1
0
05 Aug 2025
RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
Junyao Yang
Jianwei Wang
Huiping Zhuang
Cen Chen
Ziqian Zeng
MoMeLRM
173
1
0
05 Aug 2025
Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework
Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework
Jialin Li
Jinzhe Li
Gengxu Li
Yi-Ju Chang
Yuan Wu
LRM
138
0
0
05 Aug 2025
Previous
12345...101112
Next
Page 4 of 12
Pageof 12