Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2107.03374
Cited By

Evaluating Large Language Models Trained on Code

v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021

Henrique Pondé

Harrison Edwards

Nicholas Joseph

Gretchen Krueger

Mohammad Bavarian

Philippe Tillet

Matthias Plappert

Fotios Chantzis

Elizabeth Barnes

Ariel Herbert-Voss

William H. Guss

Igor Babuschkin

William Saunders

Christopher Hesse

Wojciech Zaremba

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,503 papers shown

FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning

FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning

347

4

0

09 Oct 2025

Automatic Text Box Placement for Supporting Typographic Design

Automatic Text Box Placement for Supporting Typographic Design

Daichi Haraguchi

110

0

0

09 Oct 2025

Scaling Laws for Code: A More Data-Hungry Regime

Scaling Laws for Code: A More Data-Hungry Regime

110

2

0

09 Oct 2025

Mobile Gamer Lifetime Value Prediction via Objective Decomposition and Reconstruction

Mobile Gamer Lifetime Value Prediction via Objective Decomposition and Reconstruction

118

0

0

09 Oct 2025

Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression

Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression

Chengzhengxu Li

140

0

0

09 Oct 2025

Robust Heuristic Algorithm Design with LLMs

Robust Heuristic Algorithm Design with LLMs

Siva Kesava Reddy Kakarla

69

1

0

09 Oct 2025

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

121

4

0

09 Oct 2025

Guided Star-Shaped Masked Diffusion

Guided Star-Shaped Masked Diffusion

Viacheslav Meshchaninov

Danil Sheshenya

Nikita Balagansky

Daniil Gavrilov

164

1

0

09 Oct 2025

RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution

RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution

95

1

0

09 Oct 2025

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

141

3

0

09 Oct 2025

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Kazuki Egashira

Thibaud Gloaguen

191

1

0

09 Oct 2025

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

...

153

2

0

09 Oct 2025

Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

Anderson Schneider

Yuriy Nevmyvaka

181

5

0

09 Oct 2025

MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding

MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding

Siddeshwar Raghavan

136

0

0

09 Oct 2025

TGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMs

TGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMs

Ekaterina Trofimova

112

0

0

08 Oct 2025

Fortifying LLM-Based Code Generation with Graph-Based Reasoning on Secure Coding Practices

Fortifying LLM-Based Code Generation with Graph-Based Reasoning on Secure Coding Practices

82

0

0

08 Oct 2025

Beyond Models: A Framework for Contextual and Cultural Intelligence in African AI Deployment

Beyond Models: A Framework for Contextual and Cultural Intelligence in African AI Deployment

28

0

0

08 Oct 2025

Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography

Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography

106

3

0

08 Oct 2025

Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs

Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs

Zachris Björkman

151

3

0

08 Oct 2025

Vibe Checker: Aligning Code Evaluation with Human Preference

Vibe Checker: Aligning Code Evaluation with Human Preference

...

Jeremiah Zhe Liu

Benoit Schillings

132

0

0

08 Oct 2025

U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

112

1

0

08 Oct 2025

Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection

Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection

Franco Javier Arellano

José Ignacio Orlando

97

0

0

08 Oct 2025

Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models

Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models

216

0

0

08 Oct 2025

POME: Post Optimization Model Edit via Muon-style Projection

POME: Post Optimization Model Edit via Muon-style Projection

97

0

0

08 Oct 2025

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Swarnadeep Saha

246

4

0

08 Oct 2025

ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models

ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models

109

0

0

07 Oct 2025

The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning

The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning

Alessandro Favero

237

1

0

07 Oct 2025

EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget

EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget

Hinrich Schutze

117

0

0

07 Oct 2025

Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

Nikita Pavlichenko

Ekaterina Garanina

...

Kirill Chekmenev

Yaroslav Golubev

Uladzislau Sazanovich

116

0

0

07 Oct 2025

Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

108

2

0

07 Oct 2025

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning

Sangeetha Abdu Jyothi

165

0

0

07 Oct 2025

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

108

1

0

07 Oct 2025

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits

111

3

0

07 Oct 2025

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Silvio Savarese

137

1

0

07 Oct 2025

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

146

3

0

07 Oct 2025

Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

Jonas Hübotter

Leander Diaz-Bone

160

1

0

06 Oct 2025

Context Length Alone Hurts LLM Performance Despite Perfect Retrieval

Context Length Alone Hurts LLM Performance Despite Perfect Retrieval

Subendhu Rongali

207

8

0

06 Oct 2025

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Abedelkadir Asi

141

0

0

06 Oct 2025

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

...

192

6

0

06 Oct 2025

GRACE: Generative Representation Learning via Contrastive Policy Optimization

GRACE: Generative Representation Learning via Contrastive Policy Optimization

Pengcheng Jiang

87

0

0

06 Oct 2025

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

166

4

0

06 Oct 2025

FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

Justine Gehring

Silvio Soares Ribeiro Junior

141

0

0

06 Oct 2025

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

Alexander Fichtl

128

0

0

06 Oct 2025

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

151

1

0

06 Oct 2025

Modeling Student Learning with 3.8 Million Program Traces

Modeling Student Learning with 3.8 Million Program Traces

Megha Srivastava

Jeremiah Blanchard

93

5

0

06 Oct 2025

AutoEmpirical: LLM-Based Automated Research for Empirical Software Fault Analysis

AutoEmpirical: LLM-Based Automated Research for Empirical Software Fault Analysis

85

1

0

06 Oct 2025

FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning

FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning

193

0

0

06 Oct 2025

GA4GC: Greener Agent for Greener Code via Multi-Objective Configuration Optimization

GA4GC: Greener Agent for Greener Code via Multi-Objective Configuration Optimization

...

Karine Even-Mendoza

Hector Menendez

91

1

0

05 Oct 2025

The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View

The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View

182

0

0

05 Oct 2025

What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

135

1

0

05 Oct 2025

1 2 3...6 7 8...89 90 91