Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2107.03374
Cited By
v1
v2 (latest)
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (8 upvotes)
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 4,503 papers shown
FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning
Yunbo Li
Jiaping Gui
Zhihang Deng
Fanchao Meng
Yue Wu
FedML
347
4
0
09 Oct 2025
Automatic Text Box Placement for Supporting Typographic Design
Jun Muraoka
Daichi Haraguchi
Naoto Inoue
Wataru Shimoda
Kota Yamaguchi
Seiichi Uchida
110
0
0
09 Oct 2025
Scaling Laws for Code: A More Data-Hungry Regime
Xianzhen Luo
Wenzhen Zheng
Qingfu Zhu
Rongyi Zhang
Houyi Li
Siming Huang
YuanTao Fan
Wanxiang Che
ALM
110
2
0
09 Oct 2025
Mobile Gamer Lifetime Value Prediction via Objective Decomposition and Reconstruction
Tianwei Li
Yu Zhao
Yunze Li
Sheng Li
118
0
0
09 Oct 2025
Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression
Chengzhengxu Li
Xiaoming Liu
Zhaohan Zhang
Shaochu Zhang
Shengchao Liu
Guoxin Ma
Y. Lan
Chao Shen
LRM
140
0
0
09 Oct 2025
Robust Heuristic Algorithm Design with LLMs
Pantea Karimi
Dany Rouhana
Pooria Namyar
Siva Kesava Reddy Kakarla
Venkat Arun
Behnaz Arzani
69
1
0
09 Oct 2025
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Liwei Kang
Yue Deng
Yao Xiao
Zhanfeng Mo
Wee Sun Lee
Lidong Bing
LRM
121
4
0
09 Oct 2025
Guided Star-Shaped Masked Diffusion
Viacheslav Meshchaninov
Egor Shibaev
Artem Makoian
Ivan Klimov
Danil Sheshenya
A. Malinin
Nikita Balagansky
Daniil Gavrilov
Aibek Alanov
Dmitry Vetrov
DiffM
164
1
0
09 Oct 2025
RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution
Aofan Liu
Haoxuan Li
Bin Wang
Ao Yang
Hui Li
LLMAG
95
1
0
09 Oct 2025
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue
Yifan Zhou
G. Zhang
Zaibin Zhang
Y. Li
Chen Zhang
Z. Yin
Philip Torr
Wanli Ouyang
Lei Bai
LLMAG
141
3
0
09 Oct 2025
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Kazuki Egashira
Robin Staab
Thibaud Gloaguen
Mark Vero
Martin Vechev
AAML
191
1
0
09 Oct 2025
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
Cheng Yang
X. J. Yang
Licheng Wen
Daocheng Fu
Jianbiao Mei
...
Yufan Shen
Nianchen Deng
Ding Wang
Yu Qiao
Haifeng Li
LLMAG
RALM
153
2
0
09 Oct 2025
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
Kevin Rojas
Jiahe Lin
Kashif Rasul
Anderson Schneider
Yuriy Nevmyvaka
Molei Tao
Wei Deng
181
5
0
09 Oct 2025
MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding
Siddeshwar Raghavan
Tanwi Mallick
AI4CE
136
0
0
09 Oct 2025
TGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMs
Daria Ozerova
Ekaterina Trofimova
LRM
112
0
0
08 Oct 2025
Fortifying LLM-Based Code Generation with Graph-Based Reasoning on Secure Coding Practices
Rupam Patir
Keyan Guo
Haipeng Cai
Hongxin Hu
LRM
82
0
0
08 Oct 2025
Beyond Models: A Framework for Contextual and Cultural Intelligence in African AI Deployment
Qness Ndlovu
28
0
0
08 Oct 2025
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Jiuan Zhou
Yu Cheng
Yuan Xie
Z. Yin
106
3
0
08 Oct 2025
Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs
Zachris Björkman
Jorge Loría
S. Wharrie
Samuel Kaski
CML
151
3
0
08 Oct 2025
Vibe Checker: Aligning Code Evaluation with Human Preference
Ming Zhong
Xiang Zhou
T. Chang
Q. Wang
Nan Xu
...
Shyam Upadhyay
Jeremiah Zhe Liu
Jiawei Han
Benoit Schillings
Jiao Sun
132
0
0
08 Oct 2025
U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking
Fenghe Tang
Chengqi Dong
Wenxin Ma
Zikang Xu
Heqin Zhu
Zihang Jiang
Rongsheng Wang
Yuhao Wang
Chenxu Wu
S. Kevin Zhou
ELM
VLM
112
1
0
08 Oct 2025
Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection
Franco Javier Arellano
José Ignacio Orlando
MedIm
97
0
0
08 Oct 2025
Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models
Jonggeun Lee
Woojung Song
Jongwook Han
Haesung Pyun
Yohan Jo
CLL
216
0
0
08 Oct 2025
POME: Post Optimization Model Edit via Muon-style Projection
Yong Liu
Di Fu
Yang Luo
Zirui Zhu
Minhao Cheng
Cho-Jui Hsieh
Yang You
97
0
0
08 Oct 2025
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Leitian Tao
I. Kulikov
Swarnadeep Saha
Tianlu Wang
Jing Xu
Yixuan Li
Jason Weston
Ping Yu
OffRL
LRM
246
4
0
08 Oct 2025
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
Zhangyue Yin
Qiushi Sun
Zhiyuan Zeng
Zhiyuan Yu
Zengfeng Huang
Xuanjing Huang
Xipeng Qiu
LRM
109
0
0
07 Oct 2025
The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning
Alessandro Favero
PINN
GNN
237
1
0
07 Oct 2025
EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
Liang Chen
Xueting Han
Qizhou Wang
Bo Han
Jing Bai
Hinrich Schutze
Kam-Fai Wong
117
0
0
07 Oct 2025
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding
Nikita Pavlichenko
Iurii Nazarov
Ivan Dolgov
Ekaterina Garanina
Dmitry Ustalov
...
Kirill Chekmenev
Joseph Shtok
Yaroslav Golubev
Anton Semenkin
Uladzislau Sazanovich
116
0
0
07 Oct 2025
Vul-R2: A Reasoning LLM for Automated Vulnerability Repair
Xin-Cheng Wen
Zirui Lin
Yijun Yang
Cuiyun Gao
Deheng Ye
LRM
108
2
0
07 Oct 2025
AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
Yurun Song
Zhuoyi Yang
Ian G. Harris
Sangeetha Abdu Jyothi
MQ
165
0
0
07 Oct 2025
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang
Xiaolong Tu
Hongyu Ke
Huirong Chai
Dawei Chen
Kyungtae Han
108
1
0
07 Oct 2025
CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits
Kangyu Wang
Zhiyun Jiang
Haibo Feng
Weijia Zhao
Lin Liu
Jianguo Li
Zhenzhong Lan
Weiyao Lin
111
3
0
07 Oct 2025
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Zhepeng Cen
H. Chen
Shiyu Wang
Zuxin Liu
Zhiwei Liu
Ding Zhao
Silvio Savarese
Caiming Xiong
Huan Wang
Weiran Yao
OffRL
137
1
0
07 Oct 2025
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
Yoav Gur-Arieh
Mor Geva
Atticus Geiger
KELM
146
3
0
07 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
160
1
0
06 Oct 2025
Context Length Alone Hurts LLM Performance Despite Perfect Retrieval
Yufeng Du
Minyang Tian
S. Ronanki
Subendhu Rongali
S. Bodapati
Aram Galstyan
Azton Wells
Roy Schwartz
Eliu A. Huerta
Hao Peng
RALM
LRM
207
8
0
06 Oct 2025
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
Dachuan Shi
Abedelkadir Asi
Keying Li
Xiangchi Yuan
Leyan Pan
Wenke Lee
Wen Xiao
LRM
141
0
0
06 Oct 2025
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
Wonjun Kang
Kevin Galim
Seunghyuk Oh
M Lee
Yuchen Zeng
...
Coleman Hooper
Yuezhou Hu
H. Koo
N. Cho
Kangwook Lee
192
6
0
06 Oct 2025
GRACE: Generative Representation Learning via Contrastive Policy Optimization
Jiashuo Sun
Shixuan Liu
Zhaochen Su
Xianrui Zhong
Pengcheng Jiang
Sara Szymkuć
Peiran Li
Weijia Shi
Jiawei Han
87
0
0
06 Oct 2025
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
Yicheng Tao
Yao Qin
Yepang Liu
166
4
0
06 Oct 2025
FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
Victor May
Diganta Misra
Yanqi Luo
Anjali Sridhar
Justine Gehring
Silvio Soares Ribeiro Junior
141
0
0
06 Oct 2025
The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures
Alexander Fichtl
Jeremias Bohn
Josefin Kelber
Edoardo Mosca
Georg Groh
128
0
0
06 Oct 2025
Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models
Runchu Tian
Junxia Cui
Xueqiang Xu
Feng Yao
Jingbo Shang
151
1
0
06 Oct 2025
Modeling Student Learning with 3.8 Million Program Traces
Alexis Ross
Megha Srivastava
Jeremiah Blanchard
Jacob Andreas
93
5
0
06 Oct 2025
AutoEmpirical: LLM-Based Automated Research for Empirical Software Fault Analysis
Jiongchi Yu
Weipeng Jiang
Xiaoyu Zhang
Qiang Hu
Xiaofei Xie
Chao Shen
85
1
0
06 Oct 2025
FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning
Guochen Yan
Luyuan Xie
Qingni Shen
Yuejian Fang
Zhonghai Wu
FedML
193
0
0
06 Oct 2025
GA4GC: Greener Agent for Greener Code via Multi-Objective Configuration Optimization
Jingzhi Gong
Yixin Bian
Luis de la Cal
Giovanni Pinna
Anisha Uteem
...
M. Zamorano
Karine Even-Mendoza
W.B. Langdon
Hector Menendez
Federica Sarro
91
1
0
05 Oct 2025
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
Xinhao Yao
Lu Yu
Xiaolin Hu
Fengwei Teng
Qing Cui
Jun Zhou
Yong Liu
LRM
182
0
0
05 Oct 2025
What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models
Zicong He
Boxuan Zhang
Weihao Liu
Ruixiang Tang
Lu Cheng
ELM
135
1
0
05 Oct 2025
Previous
1
2
3
...
6
7
8
...
89
90
91
Next