ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,505 papers shown
RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
Pratik Shah
Rajat Ghosh
Aryan Singhal
Debojyoti Dutta
154
0
0
27 Sep 2025
Protocode: Prototype-Driven Interpretability for Code Generation in LLMs
Protocode: Prototype-Driven Interpretability for Code Generation in LLMs
Krishna Vamshi Bodla
Haizhao Yang
127
1
0
27 Sep 2025
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
X. Xu
Xin Li
Xingwei Qu
Jie Fu
Hang Zhao
CoGeLRM
141
1
0
27 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
125
0
0
27 Sep 2025
Planner Aware Path Learning in Diffusion Language Models Training
Planner Aware Path Learning in Diffusion Language Models Training
Fred Zhangzhi Peng
Zachary Bezemek
Jarrid Rector-Brooks
Shuibai Zhang
Anru R. Zhang
Michael Bronstein
A. Bose
Alexander Tong
172
0
0
27 Sep 2025
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Chenxing Wei
Hong Wang
Ying He
Fei Richard Yu
Yao Shu
111
1
0
27 Sep 2025
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
Zehua Zhang
Ati Priya Bajaj
Divij Handa
Siyu Liu
Arvind S Raj
...
Nikhil Chapre
Yan Shoshitaishvili
Adam Doupé
Chitta Baral
Ruoyu Wang
61
0
0
27 Sep 2025
CoDA: Coding LM via Diffusion Adaptation
CoDA: Coding LM via Diffusion Adaptation
H. Chen
Shiyu Wang
Can Qin
B. Pang
Zuxin Liu
...
Shelby Heinecke
Silvio Savarese
Caiming Xiong
Huan Wang
Weiran Yao
DiffM
116
1
0
27 Sep 2025
An LLM-Powered Agent for Real-Time Analysis of the Vietnamese IT Job Market
An LLM-Powered Agent for Real-Time Analysis of the Vietnamese IT Job Market
Minh-Thuan Nguyen
Thien Vo-Thanh
Thai-Duy Dinh
Xuan-Quang Phan
Tan-Ha Mai
Lam-Son Lê
65
0
0
26 Sep 2025
A benchmark for vericoding: formally verified program synthesis
A benchmark for vericoding: formally verified program synthesis
Sergiu Bursuc
Theodore Ehrenborg
Shaowei Lin
Lacramioara Astefanoaei
Ionel Emilian Chiosa
...
Adem Bizid
Quinn Dougherty
Miranda Zhao
Max Tan
Max Tegmark
73
1
0
26 Sep 2025
Multi-Agent Path Finding via Offline RL and LLM Collaboration
Multi-Agent Path Finding via Offline RL and LLM Collaboration
Merve Atasever
Matthew Hong
Mihir Nitin Kulkarni
Qingpei Li
Jyotirmoy V. Deshmukh
AI4CE
127
0
0
26 Sep 2025
QoNext: Towards Next-generation QoE for Foundation Models
QoNext: Towards Next-generation QoE for Foundation Models
Yijin Guo
Ye Shen
Farong Wen
Junying Wang
Zicheng Zhang
Qi Jia
Guangtao Zhai
239
0
0
26 Sep 2025
PSRT: Accelerating LRM-based Guard Models via Prefilled Safe Reasoning Traces
PSRT: Accelerating LRM-based Guard Models via Prefilled Safe Reasoning Traces
Jiawei Zhao
Yuang Qi
Weiming Zhang
Nenghai Yu
Kejiang Chen
LRM
131
0
0
26 Sep 2025
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans
Yangtian Zi
Zixuan Wu
Aleksander Boruch-Gruszecki
Jonathan Bell
Arjun Guha
164
0
0
26 Sep 2025
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Yixuan Han
Fan Ma
Ruijie Quan
Yi Yang
MoELRM
99
0
0
26 Sep 2025
Compiling by Proving: Language-Agnostic Automatic Optimization from Formal Semantics
Compiling by Proving: Language-Agnostic Automatic Optimization from Formal Semantics
Jianhong Zhao
Everett Hildenbrandt
Juan Conejero
Yongwang Zhao
36
0
0
26 Sep 2025
Stochastic activations
Stochastic activations
Maria Lomeli
Matthijs Douze
Gergely Szilvasy
Loic Cabannes
Jade Copet
Sainbayar Sukhbaatar
Jason Weston
Gabriel Synnaeve
Pierre-Emmanuel Mazaré
Hervé Jégou
LLMSV
274
0
0
26 Sep 2025
Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation
Xunzhu Tang
Iyiola Emmanuel Olatunji
Tiezhu Sun
Jacques Klein
Tegawende F. Bissyande
LRM
85
1
0
26 Sep 2025
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Jonas Belouadi
T. Boubekeur
Adrien Kaiser
109
0
0
26 Sep 2025
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Yizhou Zhang
Ning Lv
T. Wang
Jisheng Dang
OffRLLRM
128
1
0
26 Sep 2025
The Emergence of Altruism in Large-Language-Model Agents Society
The Emergence of Altruism in Large-Language-Model Agents Society
Haoyang Li
Xiao Jia
Zhanzhan Zhao
LM&Ro
65
0
0
26 Sep 2025
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Shijing Hu
Jingyang Li
Zhihui Lu
Pan Zhou
142
0
0
26 Sep 2025
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Naibin Gu
Zhenyu Zhang
Yuchen Feng
Yilong Chen
Peng Fu
...
Shuohuan Wang
Yu Sun
Hua Wu
Weiping Wang
Haifeng Wang
MoE
87
0
0
26 Sep 2025
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Syeda Nahida Akter
Shrimai Prabhumoye
Eric Nyberg
M. Patwary
Mohammad Shoeybi
Yejin Choi
Bryan Catanzaro
AIFinLRMAI4CE
120
7
0
26 Sep 2025
FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding
FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding
Haorui Chen
Chengze li
Jia Li
90
0
0
26 Sep 2025
What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs
What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs
Xingyu Li
Juefei Pu
Yifan Wu
Xiaochen Zou
Shitong Zhu
...
Zhiyun Qian
Kangjie Lu
Trent Jaeger
Michael De Lucia
S. Krishnamurthy
65
1
0
26 Sep 2025
The Rogue Scalpel: Activation Steering Compromises LLM Safety
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
LLMSVAAML
145
1
0
26 Sep 2025
GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
Hanlin Zhu
Tianyu Guo
Song Mei
Stuart Russell
Nikhil Ghosh
Alberto Bietti
Jiantao Jiao
LLMAGLRM
188
0
0
26 Sep 2025
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Chi Ruan
Dongfu Jiang
Yubo Wang
Wenhu Chen
OffRLALMLRM
112
1
0
26 Sep 2025
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs
Shun-ichiro Hayashi
Koki Morita
Daichi Mukunoki
Tetsuya Hoshino
Takahiro Katagiri
99
0
0
26 Sep 2025
A Benchmark for Localizing Code and Non-Code Issues in Software Projects
A Benchmark for Localizing Code and Non-Code Issues in Software Projects
Zejun Zhang
Jian-Xun Wang
Qingyun Yang
Yifan Pan
Yi Tang
Yi Li
Zhenchang Xing
Tian Zhang
X. Li
G. Zhang
121
1
0
26 Sep 2025
A State-of-the-Art SQL Reasoning Model using RLVR
A State-of-the-Art SQL Reasoning Model using RLVR
Alnur Ali
Ashutosh Baheti
Jonathan D. Chang
Ta-Chung Chi
Brandon Cui
...
Dipendra Kumar Misra
Krista Opsahl-Ong
Jose Javier Gonzalez Ortiz
Matei A. Zaharia
Yue Zhang
OffRLReLMLRM
142
1
0
25 Sep 2025
Verification Limits Code LLM Training
Verification Limits Code LLM Training
Srishti Gureja
Elena Tommasone
Jingyi He
Sara Hooker
Matthias Gallé
Marzieh Fadaee
ALMOffRL
129
1
0
25 Sep 2025
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Ahmet Caner Yüzügüler
Ahmet Çelik
Jiawei Zhuang
Lukas Cavigelli
164
0
0
25 Sep 2025
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
Jacob Fein-Ashley
Dhruv Parikh
Rajgopal Kannan
Viktor Prasanna
MoEMoMeLRM
183
2
0
25 Sep 2025
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Kohsei Matsutani
Shota Takashiro
Gouki Minegishi
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
OffRLReLMLRM
207
6
0
25 Sep 2025
StyleBench: Evaluating thinking styles in Large Language Models
StyleBench: Evaluating thinking styles in Large Language Models
Junyu Guo
S. Gu
Ming Jin
C. Spanos
Javad Lavaei
LRM
1.1K
1
0
25 Sep 2025
RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?
RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?
Yiyou Sun
Yuhan Cao
Pohao Huang
Haoyue Bai
Hannaneh Hajishirzi
Nouha Dziri
Dawn Song
OffRLLRM
174
0
0
25 Sep 2025
Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems
Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems
Junfeng Yan
Biao Wu
Meng Fang
Ling Chen
166
0
0
25 Sep 2025
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
J. Lin
Zhongruo Wang
Kun Qian
Tian Wang
Arvind Srinivasan
...
Weiqi Zhang
Sujay Sanghavi
C. L. P. Chen
Hyokun Yun
Lihong Li
CLL
360
1
0
25 Sep 2025
Predicting LLM Reasoning Performance with Small Proxy Model
Predicting LLM Reasoning Performance with Small Proxy Model
Woosung Koh
Juyoung Suk
Sungjun Han
Se-Young Yun
Jay Shin
LRMAI4CE
271
0
0
25 Sep 2025
Towards Transparent AI: A Survey on Explainable Language Models
Towards Transparent AI: A Survey on Explainable Language Models
Avash Palikhe
Sribala Vidyadhari Chinta
Zhipeng Yin
Rui Guo
Qiang Duan
Jie Yang
Wenbin Zhang
178
2
0
25 Sep 2025
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Xuemiao Zhang
Can Ren
Chengying Tu
Rongxiang Weng
Shuo Wang
Hongfei Yan
Jingang Wang
Xunliang Cai
LRMAI4CE
216
1
0
25 Sep 2025
InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?
InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?
Anjiang Wei
Tarun Suresh
Tianran Sun
Haoze Wu
Ke Wang
Alex Aiken
65
2
0
25 Sep 2025
Enhancing Linear Attention with Residual Learning
Enhancing Linear Attention with Residual Learning
Xunhao Lai
Jialiang Kang
Jianqiao Lu
Tong Lin
Pengyu Zhao
KELMCLL
118
0
0
24 Sep 2025
Intuition to Evidence: Measuring AI's True Impact on Developer Productivity
Intuition to Evidence: Measuring AI's True Impact on Developer Productivity
Anand Kumar
Vishal Khare
Deepak Sharma
Satyam Kumar
Vijay Saini
...
Sachendra Jain
Ankit Rana
Pratham Verma
Vaibhav Meena
Avinash Edubilli
147
3
0
24 Sep 2025
Thinking Augmented Pre-training
Thinking Augmented Pre-training
Liang Wang
Nan Yang
Shaohan Huang
Li Dong
Furu Wei
LRM
300
1
0
24 Sep 2025
Automated Multi-Agent Workflows for RTL Design
Automated Multi-Agent Workflows for RTL Design
Amulya Bhattaram
Janani Ramamoorthy
Ranit Gupta
Diana Marculescu
Dimitrios Stamoulis
149
1
0
24 Sep 2025
Benchmarking Web API Integration Code Generation
Benchmarking Web API Integration Code GenerationAAAI Conference on Artificial Intelligence (AAAI), 2024
Daniel Maninger
Leon Chemnitz
Amir Molzam Sharifloo
Jannis Brugger
Mira Mezini
133
0
0
24 Sep 2025
FastEagle: Cascaded Drafting for Accelerating Speculative Decoding
FastEagle: Cascaded Drafting for Accelerating Speculative Decoding
Haiduo Huang
Jiangcheng Song
Wenzhe zhao
Pengju Ren
111
0
0
24 Sep 2025
Previous
123...91011...899091
Next