v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024

12 March 2024

Tianjun Zhang

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 559 papers shown

Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning

309

24 Dec 2025

Learning to Orchestrate Agents in Natural Language with the Conductor

107

04 Dec 2025

TRINITY: An Evolved LLM Coordinator

239

04 Dec 2025

Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity

239

04 Dec 2025

Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning

402

02 Dec 2025

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess

156

01 Dec 2025

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

...

Mohamed S. Abdelfattah

190

01 Dec 2025

InnoGym: Benchmarking the Innovation Potential of AI Agents

...

01 Dec 2025

Rectifying LLM Thought from Lens of Optimization

127

01 Dec 2025

Lightweight Latent Reasoning for Narrative Tasks

01 Dec 2025

G-KV: Decoding-Time KV Cache Eviction with Global Attention

29 Nov 2025

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

...

261

28 Nov 2025

Qwen3-VL Technical Report

...

1.7K

26 Nov 2025

Soft Adaptive Policy Optimization

314

25 Nov 2025

RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation

176

25 Nov 2025

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

...

171

24 Nov 2025

LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

254

23 Nov 2025

^3

-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models

230

21 Nov 2025

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Ali Taghibakhshi

Sharath Turuvekere Sreenivas

...

20 Nov 2025

AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

748

17 Nov 2025

P1: Mastering Physics Olympiads with Reinforcement Learning

...

334

17 Nov 2025

Incoherent Beliefs & Inconsistent Actions in Large Language Models

348

17 Nov 2025

MACEval: A Multi-Agent Continual Evaluation Network for Large Models

222

12 Nov 2025

VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation

151

11 Nov 2025

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

111

11 Nov 2025

RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

...

203

10 Nov 2025

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

...

117

10 Nov 2025

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

168

09 Nov 2025

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

Parthasarathy Ranganathan

08 Nov 2025

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

125

08 Nov 2025

An Empirical Study of Reasoning Steps in Thinking Code LLMs

08 Nov 2025

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

...

352

07 Nov 2025

Motif 2 12.7B technical report

...

102

07 Nov 2025

NVIDIA Nemotron Nano V2 VL

Nvidia

Amala Sanjay Deshmukh

...

312

06 Nov 2025

Reusing Pre-Training Data at Test Time is a Compute Multiplier

106

06 Nov 2025

CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling

05 Nov 2025

Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining

Costin-Andrei Oncescu

214

04 Nov 2025

LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

...

443

03 Nov 2025

The Future of Generative AI in Software Engineering: A Vision from Industry and Academia in the European GENIUS Project

...

183

03 Nov 2025

GenDexHand: Generative Simulation for Dexterous Hands

126

03 Nov 2025

KV Cache Transform Coding for Compact Storage in LLM Inference

Konrad Staniszewski

Adrian Łańcucki

VLM

425

03 Nov 2025

HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning

124

02 Nov 2025

Reasoning Planning for Language Models

461

01 Nov 2025

VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision

119

31 Oct 2025

SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

140

31 Oct 2025

ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus

Michael D. Moffitt

234

31 Oct 2025

LongCat-Flash-Omni Technical Report

...

591

31 Oct 2025

The End of Manual Decoding: Towards Truly End-to-End Language Models

418

30 Oct 2025

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

168

30 Oct 2025

Kimi Linear: An Expressive, Efficient Attention Architecture

...

138

30 Oct 2025