ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07974
  4. Cited By
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
    ELM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 559 papers shown
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
307
3
0
24 Dec 2025
Learning to Orchestrate Agents in Natural Language with the Conductor
Learning to Orchestrate Agents in Natural Language with the Conductor
Stefan Nielsen
Edoardo Cetin
Peter Schwendeman
Qi Sun
Jinglue Xu
Yujin Tang
LLMAG
104
1
0
04 Dec 2025
TRINITY: An Evolved LLM Coordinator
TRINITY: An Evolved LLM Coordinator
Jinglue Xu
Qi Sun
Peter Schwendeman
Stefan Nielsen
Edoardo Cetin
Yujin Tang
LLMAG
239
0
0
04 Dec 2025
Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity
Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity
Gregory Bolet
Giorgis Georgakoudis
K. Parasyris
Harshitha Menon
N. Hasabnis
Kirk W. Cameron
Gal Oren
ALMLRM
238
0
0
04 Dec 2025
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
Haonan Wang
Chao Du
Kenji Kawaguchi
Tianyu Pang
MoMeReLMLRM
402
0
0
02 Dec 2025
LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
Sai Kolasani
Maxim Saplin
Nicholas Crispino
Kyle Montgomery
Jared Quincy Davis
Matei A. Zaharia
Chi Wang
Chenguang Wang
ELMLRM
156
1
0
01 Dec 2025
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Yilong Zhao
Jiaming Tang
Kan Zhu
Zihao Ye
Chi-chih Chang
...
Mohamed S. Abdelfattah
Mingyu Gao
Baris Kasikci
Song Han
Ion Stoica
ReLMLRM
189
1
0
01 Dec 2025
InnoGym: Benchmarking the Innovation Potential of AI Agents
Jintian Zhang
Kewei Xu
Jingsheng Zheng
Zhuoyun Yu
Yuqi Zhu
...
Lun Du
Da Zheng
Shumin Deng
Huajun Chen
Ningyu Zhang
60
1
0
01 Dec 2025
Rectifying LLM Thought from Lens of Optimization
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
126
1
0
01 Dec 2025
Lightweight Latent Reasoning for Narrative Tasks
Lightweight Latent Reasoning for Narrative Tasks
Alexander Gurung
Nikolay Malkin
Mirella Lapata
OffRLLRM
92
0
0
01 Dec 2025
G-KV: Decoding-Time KV Cache Eviction with Global Attention
Mengqi Liao
Lu Wang
Chaoyun Zhang
Zekai Shen
Xiaowei Mao
Si Qin
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Huaiyu Wan
76
0
0
29 Nov 2025
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
Bao Shu
Yan Cai
Jianjian Sun
Chunrui Han
En Yu
...
Yuang Peng
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Xiangyu Yue
LLMAGKELMLRM
260
0
0
28 Nov 2025
Qwen3-VL Technical Report
Qwen3-VL Technical Report
Shuai Bai
Yuxuan Cai
Ruizhe Chen
Keqin Chen
Xionghui Chen
...
Jingren Zhou
F. I. S. Kevin Zhou
J. Zhou
Yuanzhi Zhu
Ke Zhu
VLM
1.7K
64
0
26 Nov 2025
Soft Adaptive Policy Optimization
Soft Adaptive Policy Optimization
Chang Gao
Chujie Zheng
Xiong-Hui Chen
Kai Dang
Shixuan Liu
Bowen Yu
An Yang
Shuai Bai
Jingren Zhou
Junyang Lin
311
5
0
25 Nov 2025
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
Yuanyuan Lin
Xiangyu Ouyang
Teng Zhang
Kaixin Sui
176
0
0
25 Nov 2025
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
Jiayi Zhang
Yiran Peng
Fanqi Kong
Yang Cheng
Yifan Wu
...
Hongzhang Liu
Xiangru Tang
Bang Liu
Chenglin Wu
Yuyu Luo
171
2
0
24 Nov 2025
LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs
LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs
Akashdeep Saha
Zeng Wang
Prithwish Basu Roy
J. Knechtel
Ozgur Sinanoglu
Ramesh Karri
AI4CELRM
254
1
0
23 Nov 2025
E$^3$-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
E3^33-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
Tao Yuan
Haoli Bai
Yinfei Pan
Xuyang Cao
Tianyu Zhang
Lu Hou
Ting Hu
Xianzhi Yu
VLM
230
0
0
21 Nov 2025
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Ali Taghibakhshi
Sharath Turuvekere Sreenivas
Saurav Muralidharan
Ruisi Cai
Marcin Chochowski
...
Jan Kautz
Bryan Catanzaro
Ashwath Aithal
Nima Tajbakhsh
Pavlo Molchanov
96
0
0
20 Nov 2025
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
Declan Jackson
William Keating
George Cameron
Micah Hill-Smith
HILMRALMELM
743
0
0
17 Nov 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
P1: Mastering Physics Olympiads with Reinforcement Learning
Jiacheng Chen
Qianjia Cheng
F. Yu
Haiyuan Wan
Yuchen Zhang
...
Yu Cheng
Ning Ding
Bowen Zhou
Peng Ye
Ganqu Cui
ReLMLRMAI4CE
334
1
0
17 Nov 2025
Incoherent Beliefs & Inconsistent Actions in Large Language Models
Incoherent Beliefs & Inconsistent Actions in Large Language Models
Arka Pal
Teo Kitanovski
Arthur Liang
Akilesh Potti
Micah Goldblum
348
0
0
17 Nov 2025
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
Z. Chen
Yuze Sun
Yuan Tian
Wenjun Zhang
Guangtao Zhai
ALMELM
221
1
0
12 Nov 2025
VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation
VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation
Arpan Phukan
Anupam Pandey
Deepjyoti Bodo
Asif Ekbal
LRM
151
1
0
11 Nov 2025
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Zhaojian Yu
Kaiyue Feng
Yilun Zhao
Shilin He
Xiao-Ping Zhang
Arman Cohan
110
1
0
11 Nov 2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
203
0
0
10 Nov 2025
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Zhiyuan Zeng
Hamish Ivison
Yiping Wang
Lifan Yuan
Shuyue Stella Li
...
S. Du
Natasha Jaques
Hao Peng
Pang Wei Koh
Hannaneh Hajishirzi
OffRLLRM
115
6
0
10 Nov 2025
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Sen Xu
Yi Zhou
Wei Wang
Jixin Min
Z. Yin
Yingwei Dai
Shixi Liu
Lianyu Pang
Yirong Chen
J. Zhang
MoELRMVLM
168
1
0
09 Nov 2025
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
Jeffrey Ma
Milad Hashemi
Amir Yazdanbakhsh
Kevin Swersky
Ofir Press
Enhui Li
Vijay Janapa Reddi
Parthasarathy Ranganathan
95
2
0
08 Nov 2025
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
Renren Jin
Pengzhi Gao
Yuqi Ren
Zhuowen Han
Tongxuan Zhang
Wuwei Huang
Wei Liu
Jian Luan
Deyi Xiong
LRM
125
1
0
08 Nov 2025
An Empirical Study of Reasoning Steps in Thinking Code LLMs
An Empirical Study of Reasoning Steps in Thinking Code LLMs
Haoran Xue
Gias Uddin
Song Wang
LRM
96
1
0
08 Nov 2025
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Jingxuan Xu
K. Deng
W. Li
Songwei Yu
Huaixi Tang
...
Zhaoxiang Zhang
Yuqun Zhang
H. Zhang
Bin Chen
Jiaheng Liu
ELM
352
1
0
07 Nov 2025
Motif 2 12.7B technical report
Motif 2 12.7B technical report
Junghwan Lim
S. W. Lee
Dongseok Kim
Taehyun Kim
Eunhwan Park
...
Kungyu Lee
Dongpin Oh
Yeongjae Park
Bokki Ryu
Dongjoo Weon
102
0
0
07 Nov 2025
NVIDIA Nemotron Nano V2 VL
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
311
2
0
06 Nov 2025
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Alex Fang
Thomas Voice
Ruoming Pang
Ludwig Schmidt
Tom Gunter
106
0
0
06 Nov 2025
CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling
CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling
Zekai Qu
Yinxu Pan
Ao Sun
Chaojun Xiao
Xu Han
81
0
0
05 Nov 2025
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Costin-Andrei Oncescu
Qingyang Wu
Wai Tong Chung
Robert Wu
Bryan Gopal
Junxiong Wang
Tri Dao
Ben Athiwaratkun
MoE
214
0
0
04 Nov 2025
LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge
LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge
Heng Zhou
Ao Yu
Yuchen Fan
Jianing Shi
Li Kang
...
Y. Wu
Tiancheng He
Yiran Qin
Wenlong Zhang
Zhenfei Yin
KELMRALM
443
1
0
03 Nov 2025
The Future of Generative AI in Software Engineering: A Vision from Industry and Academia in the European GENIUS Project
The Future of Generative AI in Software Engineering: A Vision from Industry and Academia in the European GENIUS Project
Robin Gröpler
Steffen Klepke
Jack Johns
Andreas Dreschinski
Klaus Schmid
...
Johannes Viehmann
Selin Şirin Aslangül
Beum Seuk Lee
Adam Ziolkowski
Eric Zie
182
1
0
03 Nov 2025
GenDexHand: Generative Simulation for Dexterous Hands
GenDexHand: Generative Simulation for Dexterous Hands
Feng Chen
Zhuxiu Xu
Tianzhe Chu
Xunzhe Zhou
Li Sun
Zewen Wu
Shenghua Gao
Zhongyu Li
Yanchao Yang
Yi Ma
126
0
0
03 Nov 2025
KV Cache Transform Coding for Compact Storage in LLM Inference
KV Cache Transform Coding for Compact Storage in LLM Inference
Konrad Staniszewski
Adrian Łańcucki
VLM
425
0
0
03 Nov 2025
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
Yujian Liu
Jiabao Ji
Yang Zhang
Wenbo Guo
Tommi Jaakkola
Shiyu Chang
121
0
0
02 Nov 2025
Reasoning Planning for Language Models
Reasoning Planning for Language Models
Bao Nguyen
Hieu Trung Nguyen
Ruifeng She
Xiaojin Fu
V. Nguyen
ReLMLRM
461
0
0
01 Nov 2025
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
Xuan Gong
Senmiao Wang
Hanbo Huang
Ruoyu Sun
Shiyu Liang
OffRLLRM
118
0
0
31 Oct 2025
SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation
SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation
Yixiang Chen
Tianshi Zheng
Shijue Huang
Zhitao He
Yi R. Fung
140
0
0
31 Oct 2025
ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus
ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus
Michael D. Moffitt
234
1
0
31 Oct 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
590
5
0
31 Oct 2025
The End of Manual Decoding: Towards Truly End-to-End Language Models
The End of Manual Decoding: Towards Truly End-to-End Language Models
Z. Wang
Dongyang Ma
X. Y. Huang
Deng Cai
Tian Lan
J. Xu
Haitao Mi
Xiaoying Tang
Yan Wang
SyDaOffRL
417
0
0
30 Oct 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
168
0
0
30 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
138
11
0
30 Oct 2025
1234...101112
Next