v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,505 papers shown

RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval

154

27 Sep 2025

Protocode: Prototype-Driven Interpretability for Code Generation in LLMs

Krishna Vamshi Bodla

Haizhao Yang

127

27 Sep 2025

Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification

141

27 Sep 2025

Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models

125

27 Sep 2025

Planner Aware Path Learning in Diffusion Language Models Training

172

27 Sep 2025

Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs

111

27 Sep 2025

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

...

27 Sep 2025

CoDA: Coding LM via Diffusion Adaptation

...

116

27 Sep 2025

An LLM-Powered Agent for Real-Time Analysis of the Vietnamese IT Job Market

26 Sep 2025

A benchmark for vericoding: formally verified program synthesis

Sergiu Bursuc

Theodore Ehrenborg

Shaowei Lin

Lacramioara Astefanoaei

...

26 Sep 2025

Multi-Agent Path Finding via Offline RL and LLM Collaboration

Jyotirmoy V. Deshmukh

AI4CE

127

26 Sep 2025

QoNext: Towards Next-generation QoE for Foundation Models

239

26 Sep 2025

PSRT: Accelerating LRM-based Guard Models via Prefilled Safe Reasoning Traces

131

26 Sep 2025

AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans

Yangtian Zi

Zixuan Wu

Aleksander Boruch-Gruszecki

Jonathan Bell

Arjun Guha

164

26 Sep 2025

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

26 Sep 2025

Compiling by Proving: Language-Agnostic Automatic Optimization from Formal Semantics

26 Sep 2025

Stochastic activations

Pierre-Emmanuel Mazaré

Hervé Jégou

LLMSV

274

26 Sep 2025

Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation

Xunzhu Tang

Iyiola Emmanuel Olatunji

Tiezhu Sun

Jacques Klein

Tegawende F. Bissyande

LRM

26 Sep 2025

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

Jonas Belouadi

T. Boubekeur

Adrien Kaiser

109

26 Sep 2025

FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning

128

26 Sep 2025

The Emergence of Altruism in Large-Language-Model Agents Society

26 Sep 2025

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

142

26 Sep 2025

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

...

26 Sep 2025

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

120

26 Sep 2025

FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding

Haorui Chen

Chengze li

Jia Li

26 Sep 2025

What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs

Xingyu Li

Juefei Pu

Yifan Wu

Xiaochen Zou

Shitong Zhu

...

26 Sep 2025

The Rogue Scalpel: Activation Steering Compromises LLM Safety

145

26 Sep 2025

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

188

26 Sep 2025

Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning

112

26 Sep 2025

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

26 Sep 2025

A Benchmark for Localizing Code and Non-Code Issues in Software Projects

121

26 Sep 2025

A State-of-the-Art SQL Reasoning Model using RLVR

...

Jose Javier Gonzalez Ortiz

Matei A. Zaharia

Yue Zhang

OffRL ReLM LRM

142

25 Sep 2025

Verification Limits Code LLM Training

129

25 Sep 2025

TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix

Ahmet Caner Yüzügüler

Ahmet Çelik

Jiawei Zhuang

Lukas Cavigelli

164

25 Sep 2025

Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say

183

25 Sep 2025

RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs

207

25 Sep 2025

StyleBench: Evaluating thinking styles in Large Language Models

1.1K

25 Sep 2025

RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?

174

25 Sep 2025

Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems

166

25 Sep 2025

SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs

...

360

25 Sep 2025

Predicting LLM Reasoning Performance with Small Proxy Model

271

25 Sep 2025

Towards Transparent AI: A Survey on Explainable Language Models

Avash Palikhe

Sribala Vidyadhari Chinta

178

25 Sep 2025

Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns

216

25 Sep 2025

InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?

25 Sep 2025

Enhancing Linear Attention with Residual Learning

118

24 Sep 2025

Intuition to Evidence: Measuring AI's True Impact on Developer Productivity

...

147

24 Sep 2025

Thinking Augmented Pre-training

300

24 Sep 2025

Automated Multi-Agent Workflows for RTL Design

149

24 Sep 2025

Benchmarking Web API Integration Code GenerationAAAI Conference on Artificial Intelligence (AAAI), 2024

Daniel Maninger

Leon Chemnitz

Amir Molzam Sharifloo

Jannis Brugger

Mira Mezini

133

24 Sep 2025

FastEagle: Cascaded Drafting for Accelerating Speculative Decoding

111

24 Sep 2025