v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024

12 March 2024

Tianjun Zhang

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 560 papers shown

Merge-of-Thought Distillation

339

10 Sep 2025

K2-Think: A Parameter-Efficient Reasoning System

...

307

09 Sep 2025

Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference

242

09 Sep 2025

SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs

145

09 Sep 2025

Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet

102

08 Sep 2025

Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing

185

08 Sep 2025

Set Block Decoding is a Language Model Inference Accelerator

162

04 Sep 2025

RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models

175

04 Sep 2025

Implicit Reasoning in Large Language Models: A Comprehensive Survey

234

02 Sep 2025

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

...

243

01 Sep 2025

LongCat-Flash Technical Report

...

425

01 Sep 2025

Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward

110

01 Sep 2025

CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs

221

31 Aug 2025

Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems?

Md Tanzib Hosain

Md Kishor Morol

ReLM LRM

117

30 Aug 2025

A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM ServicesInternational Symposium on Mixed and Augmented Reality (ISMAR), 2025

283

30 Aug 2025

Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions

131

28 Aug 2025

^2

: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models

27 Aug 2025

Alignment with Fill-In-the-Middle for Enhancing Code Generation

108

27 Aug 2025

LongReasonArena: A Long Reasoning Benchmark for Large Language Models

114

26 Aug 2025

Beyond Memorization: Reasoning-Driven Synthesis as a Mitigation Strategy Against Benchmark Contamination

214

26 Aug 2025

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

...

268

26 Aug 2025

DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models

175

25 Aug 2025

Hermes 4 Technical Report

129

25 Aug 2025

LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions

173

24 Aug 2025

AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions

...

22 Aug 2025

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

...

298

20 Aug 2025

^2

RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

148

18 Aug 2025

Reinforcement Learning with Rubric Anchors

...

130

18 Aug 2025

Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

190

18 Aug 2025

Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis

Ayoub Ben Chaliah

Hela Dellagi

OffRL LRM

104

18 Aug 2025

You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation

...

126

17 Aug 2025

Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps

158

15 Aug 2025

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

210

15 Aug 2025

Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning

Lorenzo Jaime Yu Flores

Junyi Shen

Xiaoyuan Gu

144

14 Aug 2025

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning

...

141

13 Aug 2025

Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Niels Mündler

Jasper Dekoninck

Martin Vechev

115

13 Aug 2025

User-centric Subjective Leaderboard by Customizable Reward Modeling

147

13 Aug 2025

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

194

12 Aug 2025

InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling

...

131

12 Aug 2025

Retrospective Sparse Attention for Efficient Long-Context Generation

12 Aug 2025

PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C

Pedro Orvalho

Marta Kwiatkowska

ALM

11 Aug 2025

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

...

137

11 Aug 2025

Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning

257

07 Aug 2025

Posterior-GRPO: Rewarding Reasoning Processes in Code Generation

165

07 Aug 2025

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

182

07 Aug 2025

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

...

159

06 Aug 2025

Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment

Aleksander Boruch-Gruszecki

Yangtian Zi

Zixuan Wu

Tejas Oberoi

Carolyn Jane Anderson

Joydeep Biswas

Arjun Guha

SyDa OffRL

146

06 Aug 2025

CTTS: Collective Test-Time Scaling

197

05 Aug 2025

RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior

173

05 Aug 2025

Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework

138

05 Aug 2025