Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2107.03374
Cited By

Evaluating Large Language Models Trained on Code

v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021

Henrique Pondé

Harrison Edwards

Nicholas Joseph

Gretchen Krueger

Mohammad Bavarian

Philippe Tillet

Matthias Plappert

Fotios Chantzis

Elizabeth Barnes

Ariel Herbert-Voss

William H. Guss

Igor Babuschkin

William Saunders

Christopher Hesse

Wojciech Zaremba

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,505 papers shown

MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement

MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement

128

1

0

30 Sep 2025

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

314

2

0

30 Sep 2025

Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

152

2

0

30 Sep 2025

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

112

4

0

30 Sep 2025

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

...

142

2

0

30 Sep 2025

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

1.5K

1

0

30 Sep 2025

ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models

ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models

194

0

0

29 Sep 2025

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

...

169

5

0

29 Sep 2025

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

115

1

0

29 Sep 2025

UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following

UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following

128

1

0

29 Sep 2025

Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search

Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search

...

Benoit Dumoulin

111

1

0

29 Sep 2025

Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models

Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models

117

0

0

29 Sep 2025

LLaDA-MoE: A Sparse MoE Diffusion Language Model

LLaDA-MoE: A Sparse MoE Diffusion Language Model

...

251

12

0

29 Sep 2025

RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance

RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance

149

2

0

29 Sep 2025

DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

143

0

0

29 Sep 2025

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Marianna Nezhurina

...

Aleksandra Krasnodębska

Christoph Schuhmann

Mats Leon Richter

230

1

0

29 Sep 2025

Short window attention enables long-term memorization

Short window attention enables long-term memorization

Maximilian Beck

Gergely Szilvasy

Pierre-Emmanuel Mazaré

Gabriel Synnaeve

150

1

0

29 Sep 2025

ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

44

0

0

29 Sep 2025

SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models

SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models

132

0

0

29 Sep 2025

Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development

Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development

419

3

0

29 Sep 2025

Agentic Exploration of Physics Models

Agentic Exploration of Physics Models

Maximilian Nägele

Florian Marquardt

207

1

0

29 Sep 2025

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

...

Lei Bai

452

3

1

29 Sep 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

137

1

0

29 Sep 2025

GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training

GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training

172

5

0

29 Sep 2025

MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems

^2

: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems

127

0

0

29 Sep 2025

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

Changsheng Zhao

Chia-Jung Chang

...

Raghuraman Krishnamoorthi

203

3

0

29 Sep 2025

Evaluating SAP Joule for Code Generation

Evaluating SAP Joule for Code Generation

Johannes Reisinger

Andreas Fischer

88

0

0

29 Sep 2025

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

196

0

0

29 Sep 2025

Fast Thinking for Large Language Models

Fast Thinking for Large Language Models

OffRL LLMAG ReLM LRM

252

2

0

28 Sep 2025

LLM/Agent-as-Data-Analyst: A Survey

LLM/Agent-as-Data-Analyst: A Survey

...

239

6

0

28 Sep 2025

Sequential Diffusion Language Models

Sequential Diffusion Language Models

...

111

5

0

28 Sep 2025

Future-Proofing Programmers: Optimal Knowledge Tracing for AI-Assisted Personalized Education

Future-Proofing Programmers: Optimal Knowledge Tracing for AI-Assisted Personalized Education

105

0

0

28 Sep 2025

Timber: Training-free Instruct Model Refining with Base via Effective Rank

Timber: Training-free Instruct Model Refining with Base via Effective Rank

114

1

0

28 Sep 2025

Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark

Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark

176

0

0

28 Sep 2025

Anchored Supervised Fine-Tuning

Anchored Supervised Fine-Tuning

195

0

0

28 Sep 2025

PerfBench: Can Agents Resolve Real-World Performance Bugs?

PerfBench: Can Agents Resolve Real-World Performance Bugs?

Roshanak Zilouchian Moghaddam

Neel Sundaresan

185

0

0

28 Sep 2025

Toward Preference-aligned Large Language Models via Residual-based Model Steering

Toward Preference-aligned Large Language Models via Residual-based Model Steering

Andrea Tagarelli

163

0

0

28 Sep 2025

Pretraining Scaling Laws for Generative Evaluations of Language Models

Pretraining Scaling Laws for Generative Evaluations of Language Models

Rylan Schaeffer

124

1

0

28 Sep 2025

Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms

Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms

55

0

0

28 Sep 2025

HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs

HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs

...

Zhaoxiang Zhang

161

2

0

28 Sep 2025

Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs

Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs

108

1

0

27 Sep 2025

Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction

Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction

211

0

0

27 Sep 2025

d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching

^2

Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching

106

6

0

27 Sep 2025

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Melody Zixuan Li

Kumar Krishna Agrawal

Komal Kumar Teru

Guillaume Lajoie

Blake A. Richards

201

3

0

27 Sep 2025

RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval

RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval

Debojyoti Dutta

154

0

0

27 Sep 2025

SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems

SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems

Ivan Beschastnikh

140

0

0

27 Sep 2025

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

Ati Priya Bajaj

...

Yan Shoshitaishvili

61

0

0

27 Sep 2025

Artificial Intelligence-Powered Assessment Framework for Skill-Oriented Engineering Lab Education

Artificial Intelligence-Powered Assessment Framework for Skill-Oriented Engineering Lab Education

Vaishnavi Sharma

Shashwat Sharma

Kritika Panjanani

62

0

0

27 Sep 2025

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts

Anxiang Zeng

Jinsong Su

200

5

0

27 Sep 2025

Protocode: Prototype-Driven Interpretability for Code Generation in LLMs

Protocode: Prototype-Driven Interpretability for Code Generation in LLMs

Krishna Vamshi Bodla

127

1

0

27 Sep 2025

1 2 3...8 9 10...89 90 91