Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2107.03374
Cited By

Evaluating Large Language Models Trained on Code

v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021

Henrique Pondé

Harrison Edwards

Nicholas Joseph

Gretchen Krueger

Mohammad Bavarian

Philippe Tillet

Matthias Plappert

Fotios Chantzis

Elizabeth Barnes

Ariel Herbert-Voss

William H. Guss

Igor Babuschkin

William Saunders

Christopher Hesse

Wojciech Zaremba

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,505 papers shown

What a diff makes: automating code migration with large language models

What a diff makes: automating code migration with large language models

Katherine A. Rosenfeld

57

0

0

31 Oct 2025

DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

102

0

0

31 Oct 2025

EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge

EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge

Jack FitzGerald

Aristotelis Lazaridis

Jonnathan Castillo

...

Jamie Cuticello

Colton Malkerson

320

0

0

30 Oct 2025

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

157

0

0

30 Oct 2025

QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback

QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback

Tatsuya Ishigaki

Masayuki Kawarada

Tadashi Kadowaki

...

Tamotsu Basseda

Hiroya Takamura

256

1

0

30 Oct 2025

Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis

Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis

241

0

0

30 Oct 2025

Do LLMs Signal When They're Right? Evidence from Neuron Agreement

Do LLMs Signal When They're Right? Evidence from Neuron Agreement

77

1

0

30 Oct 2025

Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation

Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation

Musfiqur Rahman

SayedHassan Khatoonabadi

374

1

0

30 Oct 2025

Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models

Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models

207

0

0

30 Oct 2025

Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math

Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math

Silvio Savarese

120

0

0

30 Oct 2025

LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits

LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits

Amir Reza Mirzaei

491

0

0

30 Oct 2025

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

168

0

0

30 Oct 2025

OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education

OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education

161

0

0

30 Oct 2025

Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead

Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead

...

113

0

0

29 Oct 2025

Predicate Renaming via Large Language Models

Predicate Renaming via Large Language Models

Elisabetta Gentili

Fabrizio Riguzzi

112

0

0

29 Oct 2025

Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents

Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents

100

1

0

29 Oct 2025

User Misconceptions of LLM-Based Conversational Programming Assistants

User Misconceptions of LLM-Based Conversational Programming Assistants

Gabrielle O'Brien

Antonio Pedro Santos Alves

Sebastian Baltes

Marcos Kalinowski

101

0

0

29 Oct 2025

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

118

0

0

29 Oct 2025

Uncovering Gaps Between RFC Updates and TCP/IP Implementations: LLM-Facilitated Differential Checks on Intermediate Representations

Uncovering Gaps Between RFC Updates and TCP/IP Implementations: LLM-Facilitated Differential Checks on Intermediate Representations

60

0

0

28 Oct 2025

Pearl: A Foundation Model for Placing Every Atom in the Right Location

Pearl: A Foundation Model for Placing Every Atom in the Right Location

Genesis Research Team

Alejandro Dobles

...

Maruan Al-Shedivat

Aleksandra Faust

Evan N. Feinberg

Michael V. LeVine

278

0

0

28 Oct 2025

StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems

StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems

72

1

0

28 Oct 2025

Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs

Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs

91

0

0

28 Oct 2025

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

...

119

2

0

28 Oct 2025

Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment

Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment

78

0

0

28 Oct 2025

APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training

APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training

104

0

0

28 Oct 2025

A Survey on LLM Mid-Training

A Survey on LLM Mid-Training

240

2

0

27 Oct 2025

Evaluating the effectiveness of LLM-based interoperability

Evaluating the effectiveness of LLM-based interoperability

Rodrigo Falcão

Stefan Schweitzer

Frank Elberzhager

24

2

0

27 Oct 2025

The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

Mikhail Arkhipov

Evgeniy Glukhov

109

0

0

27 Oct 2025

ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning

ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning

140

0

0

27 Oct 2025

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

YuanBing Ouyang

201

1

0

27 Oct 2025

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Christos Thrampoulidis

190

0

0

27 Oct 2025

PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization

PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization

188

2

0

27 Oct 2025

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Jorg K. H. Franke

372

0

0

27 Oct 2025

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Shahriar Kabir Nahin

Anshuman Chhabra

299

4

0

27 Oct 2025

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

295

6

0

27 Oct 2025

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

100

1

0

26 Oct 2025

Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs

Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs

356

1

0

25 Oct 2025

Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus

Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus

Ushik Shrestha Khwakhali

49

0

0

25 Oct 2025

PortGPT: Towards Automated Backporting Using Large Language Models

PortGPT: Towards Automated Backporting Using Large Language Models

139

0

0

25 Oct 2025

Software Engineering Agents for Embodied Controller Generation : A Study in Minigrid Environments

Software Engineering Agents for Embodied Controller Generation : A Study in Minigrid Environments

Timothé Boulet

Clément Moulin-Frier

100

0

0

24 Oct 2025

Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling

Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling

104

0

0

24 Oct 2025

Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing

Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing

Iskander Azangulov

Teodora Pandeva

Niranjani Prasad

Sushrut Karmalkar

93

1

0

24 Oct 2025

Model Merging with Functional Dual Anchors

Model Merging with Functional Dual Anchors

272

0

0

24 Oct 2025

Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

...

89

1

0

24 Oct 2025

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk

Armstrong Foundjem

Aishwarya Ramasethu

...

148

0

0

24 Oct 2025

Securing AI Agent Execution

Securing AI Agent Execution

Christoph Bühler

Matteo Biagiola

Guido Salvaneschi

275

3

0

24 Oct 2025

Designing and Evaluating Hint Generation Systems for Science Education

Designing and Evaluating Hint Generation Systems for Science Education

Smaranda Muresan

297

0

0

24 Oct 2025

Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts

Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts

...

174

1

0

24 Oct 2025

Relative-Based Scaling Law for Neural Language Models

Relative-Based Scaling Law for Neural Language Models

145

0

0

23 Oct 2025

SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets

SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets

246

0

0

22 Oct 2025

1 2 3 4 5...89 90 91