Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2107.03374
Cited By
v1
v2 (latest)
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (8 upvotes)
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 4,505 papers shown
What a diff makes: automating code migration with large language models
Katherine A. Rosenfeld
Cliff C. Kerr
Jessica Lundin
57
0
0
31 Oct 2025
DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries
Chuxuan Hu
Maxwell Yang
James Weiland
Yeji Lim
Suhas Palawala
Daniel Kang
102
0
0
31 Oct 2025
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
Jack FitzGerald
Aristotelis Lazaridis
Dylan Bates
Aman Sharma
Jonnathan Castillo
...
Dave Anderson
Jonathan Beck
Jamie Cuticello
Colton Malkerson
Tyler Saltsman
ELM
320
0
0
30 Oct 2025
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Fulin Lin
S. Chen
Ruishan Fang
Hongwei Wang
Tao Lin
LLMAG
157
0
0
30 Oct 2025
QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback
Taku Mikuriya
Tatsuya Ishigaki
Masayuki Kawarada
Shunya Minami
Tadashi Kadowaki
...
Shunya Takata
Takumi Kato
Tamotsu Basseda
Reo Yamada
Hiroya Takamura
ALM
ELM
256
1
0
30 Oct 2025
Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis
Dong Huang
Mingzhe Du
J. Zhang
Zheng Lin
Meng Luo
Qianru Zhang
See-Kiong Ng
ELM
241
0
0
30 Oct 2025
Do LLMs Signal When They're Right? Evidence from Neuron Agreement
Kang Chen
Yaoning Wang
Kai Xiong
Zhuoka Feng
Wenhe Sun
Haotian Chen
Yixin Cao
77
1
0
30 Oct 2025
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation
Musfiqur Rahman
SayedHassan Khatoonabadi
Emad Shihab
ELM
374
1
0
30 Oct 2025
Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
J. Curtò
I. D. Zarzà
Pablo García
Jordi Cabot
ELM
LRM
207
0
0
30 Oct 2025
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
Bo Pang
Deqian Kong
Silvio Savarese
Caiming Xiong
Yingbo Zhou
LRM
120
0
0
30 Oct 2025
LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits
Amir Reza Mirzaei
Yuqiao Wen
Yanshuai Cao
Lili Mou
MQ
491
0
0
30 Oct 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
168
0
0
30 Oct 2025
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
Min Zhang
Hao Chen
Hao Chen
Wenqi Zhang
Didi Zhu
Xin Lin
Bo Jiang
Aimin Zhou
Fei Wu
Kun Kuang
ELM
161
0
0
30 Oct 2025
Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead
Guang Yang
Wei-Shi Zheng
Xiang Chen
Dong Liang
Peng Hu
...
Haotian Cheng
Yiheng Shen
Xing Hu
Terry Yue Zhuo
David Lo
113
0
0
29 Oct 2025
Predicate Renaming via Large Language Models
Elisabetta Gentili
Tony Ribeiro
Fabrizio Riguzzi
Katsumi Inoue
LRM
112
0
0
29 Oct 2025
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
Jiayi Kuang
Yinghui Li
Xin Zhang
Yangning Li
Di Yin
Xing Sun
Ying Shen
Philip S. Yu
100
1
0
29 Oct 2025
User Misconceptions of LLM-Based Conversational Programming Assistants
Gabrielle O'Brien
Antonio Pedro Santos Alves
Sebastian Baltes
Grischa Liebel
Mircea Lungu
Marcos Kalinowski
101
0
0
29 Oct 2025
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Fali Wang
Jihai Chen
Shuhua Yang
Runxue Bao
Tianxiang Zhao
Zhiwei Zhang
Xianfeng Tang
Hui Liu
Qi He
Suhang Wang
118
0
0
29 Oct 2025
Uncovering Gaps Between RFC Updates and TCP/IP Implementations: LLM-Facilitated Differential Checks on Intermediate Representations
Yifan Wu
Xuewei Feng
Yuxiang Yang
Ke Xu
60
0
0
28 Oct 2025
Pearl: A Foundation Model for Placing Every Atom in the Right Location
Genesis Research Team
Alejandro Dobles
Nina Jovic
Kenneth Leidal
Pranav Murugan
...
Maruan Al-Shedivat
Aleksandra Faust
Evan N. Feinberg
Michael V. LeVine
Matteus Pan
278
0
0
28 Oct 2025
StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems
Qi Lin
Zhenyu Zhang
Viraj Thakkar
Zhenjie Sun
Mai Zheng
Zhichao Cao
72
1
0
28 Oct 2025
Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs
Xing Xing
Wei Wang
Lipeng Ma
Weidong Yang
Junjie Zheng
91
0
0
28 Oct 2025
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Bohong Wu
Mengzhao Chen
Xiang Luo
Shen Yan
Qifan Yu
...
Hongrui Zhan
Zheng Zhong
Xun Zhou
Siyuan Qiao
Xingyan Bin
119
2
0
28 Oct 2025
Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
Jian Gu
A. Aleti
Chunyang Chen
Hongyu Zhang
78
0
0
28 Oct 2025
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
Jiarui Qin
Yunjia Xi
Junjie Huang
Renting Rui
D. Yin
Weiwen Liu
Yong Yu
W. Zhang
Xing Sun
104
0
0
28 Oct 2025
A Survey on LLM Mid-Training
Chengying Tu
Xuemiao Zhang
Rongxiang Weng
Rumei Li
Chen Zhang
Yang Bai
Hongfei Yan
Jingang Wang
Xunliang Cai
OffRL
LRM
240
2
0
27 Oct 2025
Evaluating the effectiveness of LLM-based interoperability
Rodrigo Falcão
Stefan Schweitzer
Julien Siebert
Emily Calvet
Frank Elberzhager
24
2
0
27 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
109
0
0
27 Oct 2025
ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
Yilang Zhang
Xiaodong Yang
Y. Cai
G. Giannakis
140
0
0
27 Oct 2025
Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies
Bin Wang
Y. Zhong
MiDi Wan
W. Yu
YuanBing Ouyang
Y. Huang
Hui Li
SILM
AAML
201
1
0
27 Oct 2025
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Christos Thrampoulidis
Sadegh Mahdavi
Wenlong Deng
OffRL
190
0
0
27 Oct 2025
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
Xinhai Wang
Shu Yang
Liangyu Wang
L. Zhang
Huanyi Xie
Lijie Hu
Di Wang
188
2
0
27 Oct 2025
Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks
Amal Abed
Ivan Lukic
Jorg K. H. Franke
Frank Hutter
SyDa
LRM
372
0
0
27 Oct 2025
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Shrestha Datta
Shahriar Kabir Nahin
Anshuman Chhabra
P. Mohapatra
LLMAG
LM&Ro
299
4
0
27 Oct 2025
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Yixing Chen
Yiding Wang
Siqi Zhu
Haofei Yu
Tao Feng
Muhan Zhang
M. Patwary
Jiaxuan You
LLMAG
LRM
295
6
0
27 Oct 2025
Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
Yijia Fan
Jusheng Zhang
Jing Yang
Keze Wang
LLMAG
100
1
0
26 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELM
CLL
356
1
0
25 Oct 2025
Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus
Tuan-Phong Ngo
Bao-Ngoc Duong
Tuan-Anh Hoang
Joshua Dwight
Ushik Shrestha Khwakhali
49
0
0
25 Oct 2025
PortGPT: Towards Automated Backporting Using Large Language Models
Zhaoyang Li
Zheng Yu
Jingyi Song
Meng Xu
Yuxuan Luo
Dongliang Mu
VLM
139
0
0
25 Oct 2025
Software Engineering Agents for Embodied Controller Generation : A Study in Minigrid Environments
Timothé Boulet
X. Hinaut
Clément Moulin-Frier
100
0
0
24 Oct 2025
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Yuxuan Tang
Yifan Feng
104
0
0
24 Oct 2025
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Iskander Azangulov
Teodora Pandeva
Niranjani Prasad
Javier Zazo
Sushrut Karmalkar
DiffM
93
1
0
24 Oct 2025
Model Merging with Functional Dual Anchors
Kexuan Shi
Yandong Wen
Weiyang Liu
MoMe
272
0
0
24 Oct 2025
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
Qingru Zhang
Liang Qiu
Ilgee Hong
Zhenghao Xu
Tianyi Liu
...
Bing Yin
Chao Zhang
Jianshu Chen
Haoming Jiang
T. Zhao
89
1
0
24 Oct 2025
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor
Victor Lu
Vassil Tashev
Armstrong Foundjem
Aishwarya Ramasethu
...
Chris Knotz
Kongtao Chen
Alicia Parrish
Anka Reuel
Heather Frase
148
0
0
24 Oct 2025
Securing AI Agent Execution
Christoph Bühler
Matteo Biagiola
Luca Di Grazia
Guido Salvaneschi
LLMAG
275
3
0
24 Oct 2025
Designing and Evaluating Hint Generation Systems for Science Education
Anubhav Jangra
Smaranda Muresan
AI4Ed
ELM
297
0
0
24 Oct 2025
Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts
Hongwei Zhang
Ji Lu
Shiqing Jiang
Chenxiang Zhu
Li Xie
...
Baoyu Tang
Lingjun Huang
Baoli Wang
Fang Tan
Peng Zou
LRM
174
1
0
24 Oct 2025
Relative-Based Scaling Law for Neural Language Models
Baoqing Yue
Jinyuan Zhou
Zixi Wei
Jingtao Zhan
Qingyao Ai
Yiqun Liu
145
0
0
23 Oct 2025
SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets
Ziwei Wang
Jiayuan Su
Mengyu Zhou
Huaxing Zeng
Mengni Jia
Xiao Lv
Haoyu Dong
Xiaojun Ma
Shi Han
Dongmei Zhang
LMTD
246
0
0
22 Oct 2025
Previous
1
2
3
4
5
...
89
90
91
Next