v1v2 (latest)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

International Conference on Learning Representations (ICLR), 2024

12 March 2024

Tianjun Zhang

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"

50 / 560 papers shown

Automated Research Article Classification and Recommendation Using NLP and ML

07 Oct 2025

MixReasoning: Switching Modes to Think

120

07 Oct 2025

VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code

103

07 Oct 2025

Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

100

07 Oct 2025

Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

161

06 Oct 2025

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

181

06 Oct 2025

Toward a unified framework for data-efficient evaluation of large language models

05 Oct 2025

PLSemanticsBench: Large Language Models As Programming Language Interpreters

213

03 Oct 2025

On the Role of Temperature Sampling in Test-Time Scaling

102

02 Oct 2025

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

...

200

02 Oct 2025

InvThink: Towards AI Safety via Inverse Reasoning

AI4CE ReLM SILM MU LRM

282

02 Oct 2025

RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training

...

124

01 Oct 2025

ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models

Akshat Ramachandran

Marina Neseem

Charbel Sakr

Rangharajan Venkatesan

Brucek Khailany

Tushar Krishna

MQ LRM VLM

150

01 Oct 2025

RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning

271

30 Sep 2025

Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

305

30 Sep 2025

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

302

30 Sep 2025

Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs

107

30 Sep 2025

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

...

218

30 Sep 2025

Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing

114

30 Sep 2025

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

...

144

30 Sep 2025

AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models

29 Sep 2025

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

398

29 Sep 2025

Your thoughts tell who you are: Characterize the reasoning patterns of LRMs

165

29 Sep 2025

PIPer: On-Device Environment Setup via Online Reinforcement Learning

Alexander Kovrigin

Aleksandra V. Eliseeva

111

29 Sep 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

140

29 Sep 2025

UI-UG: A Unified MLLM for UI Understanding and Generation

...

194

29 Sep 2025

AutoCode: LLMs as Problem Setters for Competitive Programming

...

29 Sep 2025

ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models

194

29 Sep 2025

ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation

28 Sep 2025

Evaluating Program Semantics Reasoning with Type Inference in System F

Yifeng He

Luning Yang

Christopher Castro Gaw Gonzalo

Hao Chen

ReLM LRM

587

28 Sep 2025

HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs

...

164

28 Sep 2025

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

26 Sep 2025

SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

125

26 Sep 2025

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

206

26 Sep 2025

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

120

26 Sep 2025

Variational Reasoning for Language Models

213

26 Sep 2025

Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards

...

161

26 Sep 2025

Who's Laughing Now? An Overview of Computational Humour Generation and Explanation

141

25 Sep 2025

Verification Limits Code LLM Training

129

25 Sep 2025

PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

176

24 Sep 2025

Thinking Augmented Pre-training

322

24 Sep 2025

Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding

Mohamed S. Abdelfattah

Kai-Chiang Wu

216

22 Sep 2025

MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM

22 Sep 2025

FlowRL: Matching Reward Distributions for LLM Reasoning

...

259

18 Sep 2025

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

159

17 Sep 2025

FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction

171

16 Sep 2025

SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems

...

108

16 Sep 2025

Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models

189

15 Sep 2025

Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction

153

13 Sep 2025

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

301

11 Sep 2025