ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,499 papers shown
Title
Towards Understanding Self-play for LLM Reasoning
Towards Understanding Self-play for LLM Reasoning
Justin Yang Chae
Md Tanvirul Alam
Nidhi Rastogi
ReLMLRM
361
0
0
31 Oct 2025
What a diff makes: automating code migration with large language models
What a diff makes: automating code migration with large language models
Katherine A. Rosenfeld
Cliff C. Kerr
Jessica Lundin
40
0
0
31 Oct 2025
Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
J. Curtò
I. D. Zarzà
Pablo García
Jordi Cabot
ELMLRM
195
0
0
30 Oct 2025
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Fulin Lin
S. Chen
Ruishan Fang
Hongwei Wang
Tao Lin
LLMAG
140
0
0
30 Oct 2025
LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits
LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits
Amir Reza Mirzaei
Yuqiao Wen
Yanshuai Cao
Lili Mou
MQ
465
0
0
30 Oct 2025
Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis
Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis
Dong Huang
Mingzhe Du
J. Zhang
Zheng Lin
Meng Luo
Qianru Zhang
See-Kiong Ng
ELM
228
0
0
30 Oct 2025
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation
Musfiqur Rahman
SayedHassan Khatoonabadi
Emad Shihab
ELM
367
1
0
30 Oct 2025
QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback
QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback
Taku Mikuriya
Tatsuya Ishigaki
Masayuki Kawarada
Shunya Minami
Tadashi Kadowaki
...
Shunya Takata
Takumi Kato
Tamotsu Basseda
Reo Yamada
Hiroya Takamura
ALMELM
241
1
0
30 Oct 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
156
0
0
30 Oct 2025
Do LLMs Signal When They're Right? Evidence from Neuron Agreement
Do LLMs Signal When They're Right? Evidence from Neuron Agreement
Kang Chen
Yaoning Wang
Kai Xiong
Zhuoka Feng
Wenhe Sun
Haotian Chen
Yixin Cao
68
0
0
30 Oct 2025
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
Jack FitzGerald
Aristotelis Lazaridis
Dylan Bates
Aman Sharma
Jonnathan Castillo
...
Dave Anderson
Jonathan Beck
Jamie Cuticello
Colton Malkerson
Tyler Saltsman
ELM
298
0
0
30 Oct 2025
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
Min Zhang
Hao Chen
Hao Chen
Wenqi Zhang
Didi Zhu
Xin Lin
Bo Jiang
Aimin Zhou
Fei Wu
Kun Kuang
ELM
152
0
0
30 Oct 2025
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
Bo Pang
Deqian Kong
Silvio Savarese
Caiming Xiong
Yingbo Zhou
LRM
108
0
0
30 Oct 2025
Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead
Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead
Guang Yang
Wei-Shi Zheng
Xiang Chen
Dong Liang
Peng Hu
...
Haotian Cheng
Yiheng Shen
Xing Hu
Terry Yue Zhuo
David Lo
52
0
0
29 Oct 2025
Predicate Renaming via Large Language Models
Predicate Renaming via Large Language Models
Elisabetta Gentili
Tony Ribeiro
Fabrizio Riguzzi
Katsumi Inoue
LRM
99
0
0
29 Oct 2025
User Misconceptions of LLM-Based Conversational Programming Assistants
User Misconceptions of LLM-Based Conversational Programming Assistants
Gabrielle O'Brien
Antonio Pedro Santos Alves
Sebastian Baltes
Grischa Liebel
Mircea Lungu
Marcos Kalinowski
81
0
0
29 Oct 2025
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
Jiayi Kuang
Yinghui Li
Xin Zhang
Yangning Li
Di Yin
Xing Sun
Ying Shen
Philip S. Yu
92
1
0
29 Oct 2025
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Fali Wang
Jihai Chen
Shuhua Yang
Runxue Bao
Tianxiang Zhao
Zhiwei Zhang
Xianfeng Tang
Hui Liu
Qi He
Suhang Wang
92
0
0
29 Oct 2025
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Bohong Wu
Mengzhao Chen
Xiang Luo
Shen Yan
Qifan Yu
...
Hongrui Zhan
Zheng Zhong
Xun Zhou
Siyuan Qiao
Xingyan Bin
108
2
0
28 Oct 2025
Uncovering Gaps Between RFC Updates and TCP/IP Implementations: LLM-Facilitated Differential Checks on Intermediate Representations
Uncovering Gaps Between RFC Updates and TCP/IP Implementations: LLM-Facilitated Differential Checks on Intermediate Representations
Yifan Wu
Xuewei Feng
Yuxiang Yang
Ke Xu
56
0
0
28 Oct 2025
StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems
StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems
Qi Lin
Zhenyu Zhang
Viraj Thakkar
Zhenjie Sun
Mai Zheng
Zhichao Cao
69
1
0
28 Oct 2025
Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
Jian Gu
A. Aleti
Chunyang Chen
Hongyu Zhang
77
0
0
28 Oct 2025
Pearl: A Foundation Model for Placing Every Atom in the Right Location
Pearl: A Foundation Model for Placing Every Atom in the Right Location
Genesis Research Team
Alejandro Dobles
Nina Jovic
Kenneth Leidal
Pranav Murugan
...
Maruan Al-Shedivat
Aleksandra Faust
Evan N. Feinberg
Michael V. LeVine
Matteus Pan
255
0
0
28 Oct 2025
Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs
Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs
Xing Xing
Wei Wang
Lipeng Ma
Weidong Yang
Junjie Zheng
79
0
0
28 Oct 2025
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
Jiarui Qin
Yunjia Xi
Junjie Huang
Renting Rui
D. Yin
Weiwen Liu
Yong Yu
W. Zhang
Xing Sun
80
0
0
28 Oct 2025
Evaluating the effectiveness of LLM-based interoperability
Evaluating the effectiveness of LLM-based interoperability
Rodrigo Falcão
Stefan Schweitzer
Julien Siebert
Emily Calvet
Frank Elberzhager
20
1
0
27 Oct 2025
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
Xinhai Wang
Shu Yang
Liangyu Wang
L. Zhang
Huanyi Xie
Lijie Hu
Di Wang
169
2
0
27 Oct 2025
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Christos Thrampoulidis
Sadegh Mahdavi
Wenlong Deng
OffRL
173
0
0
27 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
99
0
0
27 Oct 2025
Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks
Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks
Amal Abed
Ivan Lukic
Jorg K. H. Franke
Frank Hutter
SyDaLRM
357
0
0
27 Oct 2025
ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
Yilang Zhang
Xiaodong Yang
Y. Cai
G. Giannakis
132
0
0
27 Oct 2025
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Yixing Chen
Yiding Wang
Siqi Zhu
Haofei Yu
Tao Feng
Muhan Zhang
M. Patwary
Jiaxuan You
LLMAGLRM
275
4
0
27 Oct 2025
Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies
Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies
Bin Wang
Y. Zhong
MiDi Wan
W. Yu
YuanBing Ouyang
Y. Huang
Hui Li
SILMAAML
194
1
0
27 Oct 2025
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Shrestha Datta
Shahriar Kabir Nahin
Anshuman Chhabra
P. Mohapatra
LLMAGLM&Ro
252
2
0
27 Oct 2025
A Survey on LLM Mid-Training
A Survey on LLM Mid-Training
Chengying Tu
Xuemiao Zhang
Rongxiang Weng
Rumei Li
Chen Zhang
Yang Bai
Hongfei Yan
Jingang Wang
Xunliang Cai
OffRLLRM
229
1
0
27 Oct 2025
Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
Yijia Fan
Jusheng Zhang
Jing Yang
Keze Wang
LLMAG
100
1
0
26 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELMCLL
317
1
0
25 Oct 2025
Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus
Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus
Tuan-Phong Ngo
Bao-Ngoc Duong
Tuan-Anh Hoang
Joshua Dwight
Ushik Shrestha Khwakhali
48
0
0
25 Oct 2025
PortGPT: Towards Automated Backporting Using Large Language Models
PortGPT: Towards Automated Backporting Using Large Language Models
Zhaoyang Li
Zheng Yu
Jingyi Song
Meng Xu
Yuxuan Luo
Dongliang Mu
VLM
132
0
0
25 Oct 2025
Software Engineering Agents for Embodied Controller Generation : A Study in Minigrid Environments
Software Engineering Agents for Embodied Controller Generation : A Study in Minigrid Environments
Timothé Boulet
X. Hinaut
Clément Moulin-Frier
84
0
0
24 Oct 2025
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Iskander Azangulov
Teodora Pandeva
Niranjani Prasad
Javier Zazo
Sushrut Karmalkar
DiffM
84
1
0
24 Oct 2025
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Yuxuan Tang
Yifan Feng
100
0
0
24 Oct 2025
Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts
Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts
Hongwei Zhang
Ji Lu
Shiqing Jiang
Chenxiang Zhu
Li Xie
...
Baoyu Tang
Lingjun Huang
Baoli Wang
Fang Tan
Peng Zou
LRM
170
1
0
24 Oct 2025
Model Merging with Functional Dual Anchors
Model Merging with Functional Dual Anchors
Kexuan Shi
Yandong Wen
Weiyang Liu
MoMe
267
0
0
24 Oct 2025
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
Qingru Zhang
Liang Qiu
Ilgee Hong
Zhenghao Xu
Tianyi Liu
...
Bing Yin
Chao Zhang
Jianshu Chen
Haoming Jiang
T. Zhao
76
1
0
24 Oct 2025
Securing AI Agent Execution
Securing AI Agent Execution
Christoph Bühler
Matteo Biagiola
Luca Di Grazia
Guido Salvaneschi
LLMAG
261
1
0
24 Oct 2025
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor
Victor Lu
Vassil Tashev
Armstrong Foundjem
Aishwarya Ramasethu
...
Chris Knotz
Kongtao Chen
Alicia Parrish
Anka Reuel
Heather Frase
133
0
0
24 Oct 2025
Designing and Evaluating Hint Generation Systems for Science Education
Designing and Evaluating Hint Generation Systems for Science Education
Anubhav Jangra
Smaranda Muresan
AI4EdELM
248
0
0
24 Oct 2025
Relative-Based Scaling Law for Neural Language Models
Relative-Based Scaling Law for Neural Language Models
Baoqing Yue
Jinyuan Zhou
Zixi Wei
Jingtao Zhan
Qingyao Ai
Yiqun Liu
116
0
0
23 Oct 2025
SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations
SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations
Amila Indika
Igor Molybog
LMTD
156
1
0
22 Oct 2025
Previous
12345...888990
Next