Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2107.03374
Cited By
v1
v2 (latest)
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (8 upvotes)
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 4,505 papers shown
MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement
Youpeng Li
Kartik Joshi
Xinda Wang
Eric Wong
128
1
0
30 Sep 2025
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models
Shutong Wu
Jiawei Zhang
DiffM
314
2
0
30 Sep 2025
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Dengming Zhang
Xiaowen Ma
Zhenliang Ni
Zhenkai Wu
Han Shu
Xin Jiang
Xinghao Chen
MoMe
152
2
0
30 Sep 2025
AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
Guanxi Lu
Hao Mark Chen
Yuto Karashima
Zhican Wang
Daichi Fujiki
Hongxiang Fan
AI4CE
112
4
0
30 Sep 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Minhui Zhu
Minyang Tian
Xiaocheng Yang
Tianci Zhou
Lifan Yuan
...
Ruixing Zhang
X. Wang
Ofir Press
Nicolas Chia
Eliu A. Huerta
LRM
ELM
142
2
0
30 Sep 2025
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
Yein Park
Minbyul Jeong
Jaewoo Kang
LRM
1.5K
1
0
30 Sep 2025
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
J. Liu
Sijun He
Jingjing Wu
X. Wang
Yang Chen
Zhaoqi Kuang
Siqi Bao
Yuan Yao
ELM
LRM
194
0
0
29 Sep 2025
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Weilin Zhao
Z. Zhou
Zhou Su
Chaojun Xiao
Yuxuan Li
...
Ruoyao Xiao
Yuxiang Huang
Ao Sun
Xu Han
Zhiyuan Liu
VLM
169
5
0
29 Sep 2025
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Haoran He
Yuxiao Ye
Qingpeng Cai
Chen-Hao Hu
Binxing Jiao
Daxin Jiang
Ling Pan
OffRL
LRM
115
1
0
29 Sep 2025
UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following
FaQiang Qian
WeiKun Zhang
Ziliang Wang
Kang An
Xuhui Zheng
Liangjian Wen
Mengya Gao
Yong Dai
Yichao Wu
128
1
0
29 Sep 2025
Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search
Yingqian Cui
Zhenwei Dai
Pengfei He
Bing He
Hui Liu
...
Jingying Zeng
Suhang Wang
Yue Xing
Shucheng Zhou
Benoit Dumoulin
OffRL
LRM
111
1
0
29 Sep 2025
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
Y. Jiang
J. Huang
Yufeng Yuan
Xin Mao
Yu Yue
Qianchuan Zhao
Lin Yan
117
0
0
29 Sep 2025
LLaDA-MoE: A Sparse MoE Diffusion Language Model
Fengqi Zhu
Zebin You
Yipeng Xing
Zenan Huang
Lin Liu
...
Junbo Zhao
Da Zheng
Chongxuan Li
Jianguo Li
J. Wen
MoE
251
12
0
29 Sep 2025
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
Tianlang Chen
Minkai Xu
Jure Leskovec
Stefano Ermon
LRM
AI4CE
149
2
0
29 Sep 2025
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Zherui Li
Zheng Nie
Zhenhong Zhou
Yufei Guo
Yue Liu
Y. Zhang
Yu Cheng
Qingsong Wen
Kun Wang
Jiaheng Zhang
AAML
143
0
0
29 Sep 2025
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Huu Nguyen
Victor May
Harsh Raj
Marianna Nezhurina
Yishan Wang
...
Aleksandra Krasnodębska
Christoph Schuhmann
Mats Leon Richter
Xuan-Son
J. Jitsev
230
1
0
29 Sep 2025
Short window attention enables long-term memorization
Loic Cabannes
Maximilian Beck
Gergely Szilvasy
Matthijs Douze
Maria Lomeli
Jade Copet
Pierre-Emmanuel Mazaré
Gabriel Synnaeve
Hervé Jégou
150
1
0
29 Sep 2025
ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models
Dongqi Zheng
LLMAG
KELM
LRM
44
0
0
29 Sep 2025
SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models
Jun Rao
Yunjie Liao
Xuebo Liu
Zepeng Lin
Lian Lian
Dong Jin
Shengjun Cheng
Jun-chen Yu
Min Zhang
132
0
0
29 Sep 2025
Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development
Yuxuan Wan
Tingshuo Liang
Jiakai Xu
Jingyu Xiao
Yintong Huo
Michael R. Lyu
LLMAG
419
3
0
29 Sep 2025
Agentic Exploration of Physics Models
Maximilian Nägele
Florian Marquardt
LLMAG
AI4CE
207
1
0
29 Sep 2025
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Zelin Tan
Hejia Geng
M. Zhang
Xiaohang Yu
Guancheng Wan
...
G. Zhang
Chen Zhang
Z. Yin
Wenlong Zhang
Lei Bai
OffRL
LRM
452
3
1
29 Sep 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang
Yue Ding
Jingwen Yang
Tianwei Luo
Dongbai Li
Ranjie Duan
Qiang Liu
Hang Su
Yinpeng Dong
Jun Zhu
LRM
137
1
0
29 Sep 2025
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training
Hongcheng Wang
Yinuo Huang
Sukai Wang
Guanghui Ren
Hao Dong
LRM
172
5
0
29 Sep 2025
MAS
2
^2
2
: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Kun Wang
G. Zhang
ManKit Ye
Xinyu Deng
Dongxia Wang
Xiaobin Hu
Jinyang Guo
Yang Liu
Yufei Guo
LLMAG
127
0
0
29 Sep 2025
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Changsheng Zhao
E. Chang
Zechun Liu
Chia-Jung Chang
Wei Wen
...
Rick Cao
Yuandong Tian
Raghuraman Krishnamoorthi
Yangyang Shi
Vikas Chandra
ReLM
LRM
203
3
0
29 Sep 2025
Evaluating SAP Joule for Code Generation
Joshua Heisler
Johannes Reisinger
Andreas Fischer
ELM
88
0
0
29 Sep 2025
Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
Wenrui Bao
Zhiben Chen
Dan Xu
Yuzhang Shang
196
0
0
29 Sep 2025
Fast Thinking for Large Language Models
Haoyu Zheng
Zhuonan Wang
Yuqian Yuan
Tianwei Lin
Wenqiao Zhang
Zheqi Lv
Juncheng Li
Siliang Tang
Yueting Zhuang
Hongyang He
OffRL
LLMAG
ReLM
LRM
252
2
0
28 Sep 2025
LLM/Agent-as-Data-Analyst: A Survey
Zirui Tang
Weizheng Wang
Z. Zhou
Yang Jiao
Bangrui Xu
...
Conghui He
Bin Wang
Conghui He
Xiaoyang Wang
Fan Wu
239
6
0
28 Sep 2025
Sequential Diffusion Language Models
Yangzhou Liu
Yue Cao
Hao-Wen Li
Gen Luo
Z. Chen
...
Yuqiang Li
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
111
5
0
28 Sep 2025
Future-Proofing Programmers: Optimal Knowledge Tracing for AI-Assisted Personalized Education
Yuchen Wang
Pei-Duo Yu
C. Tan
105
0
0
28 Sep 2025
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Taiqiang Wu
Runming Yang
Tao Liu
Jiahao Wang
Zenan Xu
Ngai Wong
114
1
0
28 Sep 2025
Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark
Xuyan Ma
Xiaofei Xie
Yawen Wang
Junjie Wang
Boyu Wu
Mingyang Li
Qing Wang
176
0
0
28 Sep 2025
Anchored Supervised Fine-Tuning
He Zhu
Junyou Su
Peng Lai
Ren Ma
Wenjia Zhang
L. Yang
Guanhua Chen
OffRL
195
0
0
28 Sep 2025
PerfBench: Can Agents Resolve Real-World Performance Bugs?
Spandan Garg
Roshanak Zilouchian Moghaddam
Neel Sundaresan
185
0
0
28 Sep 2025
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Lucio La Cava
Andrea Tagarelli
LLMSV
163
0
0
28 Sep 2025
Pretraining Scaling Laws for Generative Evaluations of Language Models
Rylan Schaeffer
Noam Levi
Brando Miranda
Sanmi Koyejo
124
1
0
28 Sep 2025
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms
Jiahao Ying
Mingbao Lin
Qianru Sun
Yixin Cao
MoE
55
0
0
28 Sep 2025
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
K. Deng
Zizheng Zhan
Wen Xiang
Wenqiang Zhu
Tianhao Peng
...
Jie Liu
Zhaoxiang Zhang
Haotian Zhang
Bin Chen
Jiaheng Liu
LRM
161
2
0
28 Sep 2025
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Chenxing Wei
Hong Wang
Ying He
Fei Richard Yu
Yao Shu
108
1
0
27 Sep 2025
Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction
Qimin Zhong
Hao Liao
Siwei Wang
Mingyang Zhou
X. Wu
Rui Mao
Wei Chen
211
0
0
27 Sep 2025
d
2
^2
2
Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
Yuchu Jiang
Yue Cai
Xiangzhong Luo
Jiale Fu
Jiarui Wang
Chonghan Liu
Xu Yang
106
6
0
27 Sep 2025
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Melody Zixuan Li
Kumar Krishna Agrawal
Arna Ghosh
Komal Kumar Teru
Adam Santoro
Guillaume Lajoie
Blake A. Richards
201
3
0
27 Sep 2025
RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
Pratik Shah
Rajat Ghosh
Aryan Singhal
Debojyoti Dutta
154
0
0
27 Sep 2025
SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems
Qian Cheng
Ruize Tang
Emilie Ma
Finn Hackett
Peiyang He
Yiming Su
Ivan Beschastnikh
Yu Huang
Xiaoxing Ma
Tianyin Xu
140
0
0
27 Sep 2025
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
Zehua Zhang
Ati Priya Bajaj
Divij Handa
Siyu Liu
Arvind S Raj
...
Nikhil Chapre
Yan Shoshitaishvili
Adam Doupé
Chitta Baral
Ruoyu Wang
61
0
0
27 Sep 2025
Artificial Intelligence-Powered Assessment Framework for Skill-Oriented Engineering Lab Education
Vaishnavi Sharma
Rakesh Thakur
Shashwat Sharma
Kritika Panjanani
62
0
0
27 Sep 2025
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
Bingshuai Liu
Ante Wang
Zijun Min
Liang Yao
Haibo Zhang
Yang Liu
Anxiang Zeng
Jinsong Su
Anxiang Zeng
Jinsong Su
OffRL
LRM
200
5
0
27 Sep 2025
Protocode: Prototype-Driven Interpretability for Code Generation in LLMs
Krishna Vamshi Bodla
Haizhao Yang
127
1
0
27 Sep 2025
Previous
1
2
3
...
8
9
10
...
89
90
91
Next