Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2107.03374
Cited By

Evaluating Large Language Models Trained on Code

v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021

Henrique Pondé

Harrison Edwards

Nicholas Joseph

Gretchen Krueger

Mohammad Bavarian

Philippe Tillet

Matthias Plappert

Fotios Chantzis

Elizabeth Barnes

Ariel Herbert-Voss

William H. Guss

Igor Babuschkin

William Saunders

Christopher Hesse

Wojciech Zaremba

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,505 papers shown

T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval

T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval

133

1

0

03 Aug 2025

MLP Memory: A Retriever-Pretrained Memory for Large Language Models

MLP Memory: A Retriever-Pretrained Memory for Large Language Models

274

0

0

03 Aug 2025

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

162

2

0

03 Aug 2025

Importance Sampling is All You Need: Predict LLM's performance on new benchmark by reusing existing benchmark

Importance Sampling is All You Need: Predict LLM's performance on new benchmark by reusing existing benchmark

156

1

0

02 Aug 2025

How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective

How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective

Albert Meroño-Peñuela

97

1

0

02 Aug 2025

TreeDiff: AST-Guided Code Generation with Diffusion LLMs

TreeDiff: AST-Guided Code Generation with Diffusion LLMs

Tingting Yu

209

4

0

02 Aug 2025

Categorical Construction of Logically Verifiable Neural Architectures

Categorical Construction of Logically Verifiable Neural Architectures

116

0

0

02 Aug 2025

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Sajana Weerawardhena

...

185

4

0

01 Aug 2025

R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge

R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge

136

0

0

01 Aug 2025

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

171

13

0

01 Aug 2025

Oedipus and the Sphinx: Benchmarking and Improving Visual Language Models for Complex Graphic Reasoning

Oedipus and the Sphinx: Benchmarking and Improving Visual Language Models for Complex Graphic Reasoning

ReLM CoGe LRM VLM

158

1

0

01 Aug 2025

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

...

369

18

0

31 Jul 2025

AutoBridge: Automating Smart Device Integration with Centralized Platform

AutoBridge: Automating Smart Device Integration with Centralized Platform

155

0

0

31 Jul 2025

DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

173

0

0

31 Jul 2025

Unveiling Super Experts in Mixture-of-Experts Large Language Models

Unveiling Super Experts in Mixture-of-Experts Large Language Models

281

3

0

31 Jul 2025

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

Meera Bharadwaj

Aparna Garimella

Jordan L. Boyd-Graber

247

0

0

30 Jul 2025

IFEvalCode: Controlled Code Generation

IFEvalCode: Controlled Code Generation

...

Wangchunshu Zhou

239

3

0

30 Jul 2025

GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries

GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries

Daniel Fernandes

Carlos M. Fernandes

Bruno D. Ferreira-Saraiva

J. Matos-Carvalho

183

3

0

30 Jul 2025

On LLM-Assisted Generation of Smart Contracts from Business Processes

On LLM-Assisted Generation of Smart Contracts from Business Processes

184

0

0

30 Jul 2025

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

Shuoyoucheng Ma

Xiaofeng Wang

Baosheng Wang

191

0

0

30 Jul 2025

From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications

From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications

Cameron S. Movassaghi

Amanda Momenzadeh

66

1

0

30 Jul 2025

UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases

UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases

Raj Vardhan Tomar

248

3

0

29 Jul 2025

ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

...

171

4

0

29 Jul 2025

Enhancing Project-Specific Code Completion by Inferring Internal API Information

Enhancing Project-Specific Code Completion by Inferring Internal API InformationIEEE Transactions on Software Engineering (TSE), 2025

178

6

0

28 Jul 2025

FMimic: Foundation Models are Fine-grained Action Learners from Human Videos

FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025

...

158

5

0

28 Jul 2025

On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey

On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey

Shouzheng Huang

267

3

0

28 Jul 2025

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

Gennady Pekhimenko

209

2

0

28 Jul 2025

Kimi K2: Open Agentic Intelligence

Kimi K2: Open Agentic Intelligence

...

182

84

0

28 Jul 2025

LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

255

2

0

28 Jul 2025

When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions

When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions

Rihab Bouyousfi

208

2

0

27 Jul 2025

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

143

0

0

26 Jul 2025

The Impact of Fine-tuning Large Language Models on Automated Program Repair

The Impact of Fine-tuning Large Language Models on Automated Program Repair

Roman Macháček

Anastasiia Grishina

155

1

0

26 Jul 2025

Flora: Effortless Context Construction to Arbitrary Length and Scale

Flora: Effortless Context Construction to Arbitrary Length and Scale

253

1

0

26 Jul 2025

MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

Dilek Hakkani-Tür

Ismini Lourentzou

171

1

0

25 Jul 2025

PennyCoder: Efficient Domain-Specific LLMs for PennyLane-Based Quantum Code Generation

PennyCoder: Efficient Domain-Specific LLMs for PennyLane-Based Quantum Code Generation

Muhammad Haider Asif

Muhammad Kashif

Alberto Marchisio

Muhammad Shafique

160

2

0

25 Jul 2025

PurpCode: Reasoning for Safer Code Generation

PurpCode: Reasoning for Safer Code Generation

...

Hadjer Benkraouda

Ismini Lourentzou

447

7

0

25 Jul 2025

Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems

Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch ProblemsIEEE Transactions on Smart Grid (IEEE Trans. Smart Grid), 2025

221

3

0

25 Jul 2025

CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback

CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback

154

2

0

25 Jul 2025

Learning neuro-symbolic convergent term rewriting systems

Learning neuro-symbolic convergent term rewriting systems

Flavio Petruzzellis

Alberto Testolin

126

0

0

25 Jul 2025

MemoCoder: Automated Function Synthesis using LLM-Supported Agents

MemoCoder: Automated Function Synthesis using LLM-Supported Agents

Zhen Ming Jiang

220

0

0

24 Jul 2025

Technical Report of TeleChat2, TeleChat2.5 and T1

Technical Report of TeleChat2, TeleChat2.5 and T1

...

Shuangyong Song

428

6

0

24 Jul 2025

Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

167

1

0

24 Jul 2025

Automated Code Review Using Large Language Models with Symbolic Reasoning

Automated Code Review Using Large Language Models with Symbolic ReasoningInternational Service Availability Symposium (ISAS), 2025

160

0

0

24 Jul 2025

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment

376

11

0

24 Jul 2025

AccessGuru: Leveraging LLMs to Detect and Correct Web Accessibility Violations in HTML Code

AccessGuru: Leveraging LLMs to Detect and Correct Web Accessibility Violations in HTML Code

Nadeen Fathallah

Daniel Hernández

147

2

0

24 Jul 2025

Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

189

13

0

24 Jul 2025

Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation

Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation

341

16

0

24 Jul 2025

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

148

7

0

23 Jul 2025

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

416

2

0

23 Jul 2025

WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training

WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training

264

6

0

23 Jul 2025

1 2 3...15 16 17...89 90 91

Page 16 of 91

Pageof 91