Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2107.03374
Cited By
v1
v2 (latest)
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (8 upvotes)
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 4,451 papers shown
Title
AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
Guanxi Lu
Hao Mark Chen
Yuto Karashima
Zhican Wang
Daichi Fujiki
Hongxiang Fan
AI4CE
94
0
0
30 Sep 2025
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
H. Dai
Maoquan Wang
Mengnan Qi
Yikai Zhang
Zijian Jin
Yongqiang Yao
Yufan Huang
Shengyu Fu
Elsie Nallipogu
LLMAG
66
0
0
30 Sep 2025
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
Yuheng Zhang
Wenlin Yao
Changlong Yu
Yao Liu
Qingyu Yin
Bing Yin
Hyokun Yun
Lihong Li
97
0
0
30 Sep 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Minhui Zhu
Minyang Tian
Xiaocheng Yang
Tianci Zhou
Lifan Yuan
...
Ruixing Zhang
X. Wang
Ofir Press
Nicolas Chia
Eliu A. Huerta
LRM
ELM
78
2
0
30 Sep 2025
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yanfeng Wang
Y. Wang
OffRL
LRM
84
0
0
30 Sep 2025
dParallel: Learnable Parallel Decoding for dLLMs
Zigeng Chen
Gongfan Fang
Xinyin Ma
Ruonan Yu
Xinchao Wang
88
5
0
30 Sep 2025
MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement
Youpeng Li
Kartik Joshi
Xinda Wang
Eric Wong
86
1
0
30 Sep 2025
LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
Zhanda Zhu
Qidong Su
Yaoyao Ding
Kevin Song
Shang Wang
Gennady Pekhimenko
MoMe
148
0
0
30 Sep 2025
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
Miao Rang
Zhenni Bi
Hang Zhou
Hanting Chen
An Xiao
Tianyu Guo
Kai Han
Xinghao Chen
Yunhe Wang
129
1
0
30 Sep 2025
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Dengming Zhang
Xiaowen Ma
Zhenliang Ni
Zhenkai Wu
Han Shu
Xin Jiang
Xinghao Chen
MoMe
132
2
0
30 Sep 2025
Accelerating LLM Inference with Precomputed Query Storage
Jay H. Park
Youngju Cho
Choungsol Lee
Moonwook Oh
Euiseong Seo
RALM
28
0
0
30 Sep 2025
DyFlow: Dynamic Workflow Framework for Agentic Reasoning
Yanbo Wang
Z. Xu
Yue Huang
Xiangqi Wang
Zirui Song
...
Xiangru Tang
Yue Zhao
Arman Cohan
Xiangliang Zhang
Xiuying Chen
LRM
AI4CE
101
0
0
30 Sep 2025
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Xin Yu
Cong Xie
Ziyu Zhao
Tiantian Fan
Lingzhou Xue
Zhi-Li Zhang
192
0
0
30 Sep 2025
Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models
Jaesung R. Park
Junsu Kim
Gyeongman Kim
Jinyoung Jo
Sean Choi
Jaewoong Cho
Ernest K. Ryu
57
1
0
30 Sep 2025
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Zelin Tan
Hejia Geng
M. Zhang
Xiaohang Yu
Guancheng Wan
...
Zaibin Zhang
G. Zhang
Chen Zhang
Z. Yin
Wenlong Zhang
OffRL
LRM
234
2
1
29 Sep 2025
Short window attention enables long-term memorization
Loic Cabannes
Maximilian Beck
Gergely Szilvasy
Matthijs Douze
Maria Lomeli
Jade Copet
Pierre-Emmanuel Mazaré
Gabriel Synnaeve
Hervé Jégou
108
1
0
29 Sep 2025
SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models
Jun Rao
Yunjie Liao
Xuebo Liu
Zepeng Lin
Lian Lian
Dong Jin
Shengjun Cheng
Jun-chen Yu
Min Zhang
116
0
0
29 Sep 2025
Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development
Yuxuan Wan
Tingshuo Liang
Jiakai Xu
Jingyu Xiao
Yintong Huo
Michael R. Lyu
LLMAG
289
2
0
29 Sep 2025
Evaluating SAP Joule for Code Generation
Joshua Heisler
Johannes Reisinger
Andreas Fischer
ELM
68
0
0
29 Sep 2025
ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models
Dongqi Zheng
LLMAG
KELM
LRM
40
0
0
29 Sep 2025
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Zherui Li
Zheng Nie
Zhenhong Zhou
Yufei Guo
Yue Liu
Y. Zhang
Yu Cheng
Qingsong Wen
Kun Wang
Jiaheng Zhang
AAML
119
0
0
29 Sep 2025
LLaDA-MoE: A Sparse MoE Diffusion Language Model
Fengqi Zhu
Zebin You
Yipeng Xing
Zenan Huang
Lin Liu
...
Junbo Zhao
Da Zheng
Chongxuan Li
Jianguo Li
J. Wen
MoE
176
8
0
29 Sep 2025
Agentic Exploration of Physics Models
Maximilian Nägele
Florian Marquardt
LLMAG
AI4CE
107
1
0
29 Sep 2025
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
Tianlang Chen
Minkai Xu
Jure Leskovec
Stefano Ermon
LRM
AI4CE
110
2
0
29 Sep 2025
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Weilin Zhao
Z. Zhou
Zhou Su
Chaojun Xiao
Yuxuan Li
...
Ruoyao Xiao
Yuxiang Huang
Ao Sun
Xu Han
Zhiyuan Liu
VLM
143
4
0
29 Sep 2025
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Changsheng Zhao
E. Chang
Zechun Liu
Chia-Jung Chang
Wei Wen
...
Rick Cao
Yuandong Tian
Raghuraman Krishnamoorthi
Yangyang Shi
Vikas Chandra
ReLM
LRM
157
2
0
29 Sep 2025
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training
Hongcheng Wang
Yinuo Huang
Sukai Wang
Guanghui Ren
Hao Dong
LRM
113
3
0
29 Sep 2025
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
Y. Jiang
J. Huang
Yufeng Yuan
Xin Mao
Yu Yue
Qianchuan Zhao
Lin Yan
65
0
0
29 Sep 2025
MAS
2
^2
2
: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Kun Wang
G. Zhang
ManKit Ye
Xinyu Deng
Dongxia Wang
Xiaobin Hu
Jinyang Guo
Yang Liu
Yufei Guo
LLMAG
106
0
0
29 Sep 2025
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
J. Liu
Sijun He
Jingjing Wu
X. Wang
Yang Chen
Zhaoqi Kuang
Siqi Bao
Yuan Yao
ELM
LRM
116
0
0
29 Sep 2025
Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search
Yingqian Cui
Zhenwei Dai
Pengfei He
Bing He
Hui Liu
...
Jingying Zeng
Suhang Wang
Yue Xing
Shucheng Zhou
Benoit Dumoulin
OffRL
LRM
81
1
0
29 Sep 2025
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Haoran He
Yuxiao Ye
Qingpeng Cai
Chen-Hao Hu
Binxing Jiao
Daxin Jiang
Ling Pan
OffRL
LRM
82
0
0
29 Sep 2025
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Huu Nguyen
Victor May
Harsh Raj
Marianna Nezhurina
Yishan Wang
...
Aleksandra Krasnodębska
Christoph Schuhmann
Mats Leon Richter
Xuan-Son
J. Jitsev
131
1
0
29 Sep 2025
UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following
FaQiang Qian
WeiKun Zhang
Ziliang Wang
Kang An
Xuhui Zheng
Liangjian Wen
Mengya Gao
Yong Dai
Yichao Wu
68
1
0
29 Sep 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang
Yue Ding
Jingwen Yang
Tianwei Luo
Dongbai Li
Ranjie Duan
Qiang Liu
Hang Su
Yinpeng Dong
Jun Zhu
LRM
65
1
0
29 Sep 2025
Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
Wenrui Bao
Zhiben Chen
Dan Xu
Yuzhang Shang
128
0
0
29 Sep 2025
Fast Thinking for Large Language Models
Haoyu Zheng
Zhuonan Wang
Yuqian Yuan
Tianwei Lin
Wenqiao Zhang
Zheqi Lv
Juncheng Li
Siliang Tang
Yueting Zhuang
Hongyang He
OffRL
LLMAG
ReLM
LRM
219
1
0
28 Sep 2025
LLM/Agent-as-Data-Analyst: A Survey
Zirui Tang
Weizheng Wang
Z. Zhou
Yang Jiao
Bangrui Xu
...
Conghui He
Bin Wang
Conghui He
Xiaoyang Wang
Fan Wu
166
5
0
28 Sep 2025
Future-Proofing Programmers: Optimal Knowledge Tracing for AI-Assisted Personalized Education
Yuchen Wang
Pei-Duo Yu
C. Tan
56
0
0
28 Sep 2025
Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark
Xuyan Ma
Xiaofei Xie
Yawen Wang
Junjie Wang
Boyu Wu
Mingyang Li
Qing Wang
104
0
0
28 Sep 2025
Sequential Diffusion Language Models
Yangzhou Liu
Yue Cao
Hao-Wen Li
Gen Luo
Z. Chen
...
Yuqiang Li
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
76
3
0
28 Sep 2025
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
K. Deng
Zizheng Zhan
Wen Xiang
Wenqiang Zhu
Tianhao Peng
...
Jie Liu
Zhaoxiang Zhang
Haotian Zhang
Bin Chen
Jiaheng Liu
LRM
116
2
0
28 Sep 2025
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Taiqiang Wu
Runming Yang
Tao Liu
Jiahao Wang
Zenan Xu
Ngai Wong
72
1
0
28 Sep 2025
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms
Jiahao Ying
Mingbao Lin
Qianru Sun
Yixin Cao
MoE
36
0
0
28 Sep 2025
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Lucio La Cava
Andrea Tagarelli
LLMSV
140
0
0
28 Sep 2025
PerfBench: Can Agents Resolve Real-World Performance Bugs?
Spandan Garg
Roshanak Zilouchian Moghaddam
Neel Sundaresan
147
0
0
28 Sep 2025
Anchored Supervised Fine-Tuning
He Zhu
Junyou Su
Peng Lai
Ren Ma
Wenjia Zhang
L. Yang
Guanhua Chen
OffRL
128
0
0
28 Sep 2025
Pretraining Scaling Laws for Generative Evaluations of Language Models
Rylan Schaeffer
Noam Levi
Brando Miranda
Sanmi Koyejo
64
0
0
28 Sep 2025
Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction
Qimin Zhong
Hao Liao
Siwei Wang
Mingyang Zhou
X. Wu
Rui Mao
Wei Chen
178
0
0
27 Sep 2025
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Wonje Jeung
Sangyeon Yoon
Yoonjun Cho
Dongjae Jeon
Sangwoo Shin
Hyesoo Hong
Albert No
DiffM
129
0
0
27 Sep 2025
Previous
1
2
3
...
7
8
9
...
88
89
90
Next