ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07974
  4. Cited By
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
    ELM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 560 papers shown
Automated Research Article Classification and Recommendation Using NLP and ML
Automated Research Article Classification and Recommendation Using NLP and ML
Shadikur Rahman
Hasibul Karim Shanto
Umme Ayman Koana
Syed Muhammad Danish
99
0
0
07 Oct 2025
MixReasoning: Switching Modes to Think
MixReasoning: Switching Modes to Think
Haiquan Lu
Gongfan Fang
Xinyin Ma
Qi Li
Xinchao Wang
LRM
120
4
0
07 Oct 2025
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Lingfei Zeng
Fengdi Che
Xuhan Huang
Fei Ye
X. Xu
Hang Zhao
Jie Fu
103
1
0
07 Oct 2025
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices
Mallika Mainali
Harsha Sureshbabu
Anik Sen
Christopher B. Rauch
Noah Reifsnyder
John Meyer
J. T. Turner
Michael W. Floyd
M. Molineaux
Rosina O. Weber
100
0
0
07 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
161
1
0
06 Oct 2025
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
Yicheng Tao
Yao Qin
Yepang Liu
3DV
181
5
0
06 Oct 2025
Toward a unified framework for data-efficient evaluation of large language models
Toward a unified framework for data-efficient evaluation of large language models
Lele Liao
Qile Zhang
Ruofan Wu
Guanhua Fang
98
1
0
05 Oct 2025
PLSemanticsBench: Large Language Models As Programming Language Interpreters
PLSemanticsBench: Large Language Models As Programming Language Interpreters
Aditya Thimmaiah
Jiyang Zhang
Jayanth Srinivasa
Junyi Jessy Li
Miloš Gligorić
ReLMLRM
213
0
0
03 Oct 2025
On the Role of Temperature Sampling in Test-Time Scaling
On the Role of Temperature Sampling in Test-Time Scaling
Yuheng Wu
Azalia Mirhoseini
Thierry Tambe
ALMLRM
102
1
1
02 Oct 2025
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
Zhenyu Pan
Y. Zhang
Zhuo Liu
Y. Tang
Zeliang Zhang
...
Haoyang Fang
Manling Li
Chenliang Xu
Philip S. Yu
Han Liu
AAML
200
0
0
02 Oct 2025
InvThink: Towards AI Safety via Inverse Reasoning
InvThink: Towards AI Safety via Inverse Reasoning
Yubin Kim
Taehan Kim
Lizhou Fan
Chunjong Park
C. Breazeal
Daniel J. McDuff
Hae Won Park
AI4CEReLMSILMMULRM
282
1
0
02 Oct 2025
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
Tao Ren
Jinyang Jiang
Hui Yang
Wan Tian
Minhao Zou
...
Shentao Qin
Yanjun Zhao
Rui Tao
Hui Shao
Yijie Peng
124
1
0
01 Oct 2025
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Akshat Ramachandran
Marina Neseem
Charbel Sakr
Rangharajan Venkatesan
Brucek Khailany
Tushar Krishna
MQLRMVLM
150
1
1
01 Oct 2025
RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning
RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning
Gang Li
Yulei Qin
Xiaoyu Tan
Dingkang Yang
Yuchen Shi
Zihan Xu
Xiang Li
Xing Sun
Ke Li
OffRLReLMLRM
271
0
0
30 Sep 2025
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Xin Xu
Cliveb AI
Kai Yang
Tianhao Chen
Yang Wang
Saiyong Yang
Can Yang
OffRLReLMLRM
305
2
0
30 Sep 2025
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Shenao Zhang
Donghan Yu
Yihao Feng
Bowen Jin
Zhaoran Wang
John Peebles
Zirui Wang
OffRLReLMLRM
302
0
0
30 Sep 2025
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
H. Dai
Maoquan Wang
Mengnan Qi
Yikai Zhang
Zijian Jin
Yongqiang Yao
Yufan Huang
Shengyu Fu
Elsie Nallipogu
LLMAG
107
0
0
30 Sep 2025
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
S. Venkatraman
Vineet Jain
Sarthak Mittal
Vedant Shah
J. Obando-Ceron
...
B. Kailkhura
Guillaume Lajoie
Glen Berseth
Nikolay Malkin
Moksh Jain
ReLMAIFinLRM
218
3
0
30 Sep 2025
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Yang Tang
Ruijie Liu
Yifan Wang
Shiyu Li
Xi Chen
114
0
0
30 Sep 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Minhui Zhu
Minyang Tian
Xiaocheng Yang
Tianci Zhou
Lifan Yuan
...
Ruixing Zhang
X. Wang
Ofir Press
Nicolas Chia
Eliu A. Huerta
LRMELM
144
2
0
30 Sep 2025
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Zihao Zhu
Xinyu Wu
Gehan Hu
Siwei Lyu
Ke Xu
Baoyuan Wu
LRM
99
1
0
29 Sep 2025
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Guibin Zhang
Muxin Fu
Shuicheng Yan
LLMAG
398
9
0
29 Sep 2025
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
Yida Chen
Yuning Mao
Xianjun Yang
Suyu Ge
Shengjie Bi
Lijuan Liu
Saghar Hosseini
L Tan
Yixin Nie
Shaoliang Nie
LRM
165
0
0
29 Sep 2025
PIPer: On-Device Environment Setup via Online Reinforcement Learning
PIPer: On-Device Environment Setup via Online Reinforcement Learning
Alexander Kovrigin
Aleksandra V. Eliseeva
Konstantin Grotov
Egor Bogomolov
Yaroslav Zharov
OffRL
111
0
0
29 Sep 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang
Yue Ding
Jingwen Yang
Tianwei Luo
Dongbai Li
Ranjie Duan
Qiang Liu
Hang Su
Yinpeng Dong
Jun Zhu
LRM
140
1
0
29 Sep 2025
UI-UG: A Unified MLLM for UI Understanding and Generation
UI-UG: A Unified MLLM for UI Understanding and Generation
Hao Yang
Weijie Qiu
Ru Zhang
Zhou Fang
Ruichao Mao
...
Maji Huang
Longxiang Zhang
Teng Guo
Shuoyang Liu
Hai Rao
MLLM
194
1
0
29 Sep 2025
AutoCode: LLMs as Problem Setters for Competitive Programming
AutoCode: LLMs as Problem Setters for Competitive Programming
Shang Zhou
Zihan Zheng
Kaiyuan Liu
Zeyu Shen
Zerui Cheng
...
Peter Henderson
Natasha Jaques
Pramod Viswanath
Saining Xie
Jingbo Shang
99
1
0
29 Sep 2025
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
J. Liu
Sijun He
Jingjing Wu
X. Wang
Yang Chen
Zhaoqi Kuang
Siqi Bao
Yuan Yao
ELMLRM
194
0
0
29 Sep 2025
ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation
ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation
Haonan Wang
Junfeng Sun
Xingdi Yuan
Ruoyao Wang
Ziang Xiao
95
0
0
28 Sep 2025
Evaluating Program Semantics Reasoning with Type Inference in System F
Evaluating Program Semantics Reasoning with Type Inference in System F
Yifeng He
Luning Yang
Christopher Castro Gaw Gonzalo
Hao Chen
ReLMLRM
587
1
0
28 Sep 2025
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
K. Deng
Zizheng Zhan
Wen Xiang
Wenqiang Zhu
Tianhao Peng
...
Jie Liu
Zhaoxiang Zhang
Haotian Zhang
Bin Chen
Jiaheng Liu
LRM
164
2
0
28 Sep 2025
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Yixuan Han
Fan Ma
Ruijie Quan
Yi Yang
MoELRM
99
0
0
26 Sep 2025
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
Kanghoon Yoon
Minsub Kim
Sungjae Lee
Joonhyung Lee
Sunghyeon Woo
Yeonjun In
S. Kwon
Chanyoung Park
Dongsoo Lee
125
1
0
26 Sep 2025
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Yoonjeon Kim
Doohyuk Jang
Eunho Yang
ReLMAIFinLRM
206
1
0
26 Sep 2025
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Syeda Nahida Akter
Shrimai Prabhumoye
Eric Nyberg
M. Patwary
Mohammad Shoeybi
Yejin Choi
Bryan Catanzaro
AIFinLRMAI4CE
120
6
0
26 Sep 2025
Variational Reasoning for Language Models
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRLLRM
213
0
0
26 Sep 2025
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
Aaron Tu
Weihao Xuan
Heli Qi
X. Y. Huang
Qingcheng Zeng
...
Amin Saberi
Naoto Yokoya
Jure Leskovec
Yejin Choi
Fang Wu
OffRL
161
3
0
26 Sep 2025
Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
Tyler Loakman
William Thorne
Chenghua Lin
LRM
141
2
0
25 Sep 2025
Verification Limits Code LLM Training
Verification Limits Code LLM Training
Srishti Gureja
Elena Tommasone
Jingyi He
Sara Hooker
Matthias Gallé
Marzieh Fadaee
ALMOffRL
129
1
0
25 Sep 2025
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning
Xueliang Zhao
Wei Wu
Jian Guan
Zhuocheng Gong
Lingpeng Kong
ReLMOffRLLRMAI4TS
176
1
0
24 Sep 2025
Thinking Augmented Pre-training
Thinking Augmented Pre-training
Liang Wang
Nan Yang
Shaohan Huang
Li Dong
Furu Wei
LRM
322
2
0
24 Sep 2025
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
Pei-Shuo Wang
Jian-Jia Chen
Chun-Che Yang
Chi-chih Chang
N. Huang
Mohamed S. Abdelfattah
Kai-Chiang Wu
MQ
216
0
0
22 Sep 2025
MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM
MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM
W. Lee
Junhee Cho
Jungwook Choi
LLMAGALM
97
0
0
22 Sep 2025
FlowRL: Matching Reward Distributions for LLM Reasoning
FlowRL: Matching Reward Distributions for LLM Reasoning
Xuekai Zhu
Daixuan Cheng
D. Zhang
Hengli Li
Kaiyan Zhang
...
J. Gao
Xiaodong Liu
Bowen Zhou
Hongyuan Mei
Zhouhan Lin
LRM
259
7
0
18 Sep 2025
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Qikai Chang
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Yicheng Pan
Jianshu Zhang
Jun Du
Quan Liu
J. Gao
OffRLLRM
159
3
0
17 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
171
1
0
16 Sep 2025
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
Xifeng Yao
Dongyu Lang
Wu Zhang
Xintong Guo
Huarui Xie
...
Ping Liu
Guang Shen
Yi Bai
Dandan Tu
Changzheng Zhang
108
0
0
16 Sep 2025
Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models
Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models
Jian-Xun Wang
Xiaofei Xie
Q. Hu
Shangqing Liu
Yi Li
LRM
189
2
0
15 Sep 2025
Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction
Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction
Yijun Liu
Yixuan Wang
Yuzhuang Xu
Shiyu Ji
Yang Xu
Qingfu Zhu
Wanxiang Che
153
0
0
13 Sep 2025
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Akshit Sinha
Arvindh Arun
Shashwat Goel
Steffen Staab
Jonas Geiping
ALMLRM
301
9
0
11 Sep 2025
Previous
123456...101112
Next
Page 3 of 12
Pageof 12