Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2107.03374
Cited By
v1
v2 (latest)
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (8 upvotes)
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 4,483 papers shown
Title
Evaluating LLM Story Generation through Large-scale Network Analysis of Social Structures
Hiroshi Nonaka
K. E. Perry
73
0
0
21 Oct 2025
RESCUE: Retrieval Augmented Secure Code Generation
Jiahao Shi
Tianyi Zhang
SILM
212
0
0
21 Oct 2025
CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
Shaobo Wang
Yongliang Miao
Yuancheng Liu
Qianli Ma
Ning Liao
Linfeng Zhang
LRM
145
1
0
21 Oct 2025
Learning from Generalization Patterns: An Evaluation-Driven Approach to Enhanced Data Augmentation for Fine-Tuning Small Language Models
Huan Song
Deeksha Razdan
Yiyue Qian
Arijit Ghosh Chowdhury
Parth Patwa
Aman Chadha
Shinan Zhang
Sharlina Keshava
Hannah R Marlowe
86
1
0
20 Oct 2025
Reasoning Distillation and Structural Alignment for Improved Code Generation
Amir Jalilifard
Anderson de Rezende Rocha
Marcos Medeiros Raimundo
OffRL
LRM
108
0
0
20 Oct 2025
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Yihong Dong
Zhaoyu Ma
Xue Jiang
Zhiyuan Fan
Jiaru Qian
...
Rongyu Cao
B. Li
Fei Huang
Yongbin Li
Ge Li
112
3
0
20 Oct 2025
Soft-Masked Diffusion Language Models
Michael Hersche
Samuel Moor-Smith
Thomas Hofmann
Abbas Rahimi
256
0
0
20 Oct 2025
TREAT: A Code LLMs Trustworthiness / Reliability Evaluation and Testing Framework
Shuzheng Gao
E. Li
Man Ho Lam
Jingyu Xiao
Yuxuan Wan
Chaozheng Wang
Ng Man Tik
Michael R. Lyu
140
0
0
20 Oct 2025
JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs
Junlan Feng
Fanyu Meng
Chong Long
Pengyu Cong
Duqing Wang
...
Z. Ren
Fan Yang
Na Wu
Di Jin
Chao Deng
HILM
170
0
0
20 Oct 2025
Verification-Aware Planning for Multi-Agent Systems
Tianyang Xu
Dan Zhang
Kushan Mitra
Estevam R. Hruschka
LLMAG
76
0
0
20 Oct 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen
Akari Asai
Luke Zettlemoyer
Hannaneh Hajishirzi
Faeze Brahman
OffRL
HILM
LRM
177
0
0
20 Oct 2025
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Jiawei Zhang
Andrew Estornell
David D. Baek
B. Li
Xiaojun Xu
152
0
0
20 Oct 2025
Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models
Shuodi Liu
Y. Liu
Zi Wang
Yusheng Wang
Huijia Wu
Liuyu Xiang
Zhaofeng He
96
0
0
20 Oct 2025
The Free Transformer
François Fleuret
40
0
0
20 Oct 2025
StreamingThinker: Large Language Models Can Think While Reading
Junlong Tong
Yingqi Fan
Anhao Zhao
Yunpu Ma
Xiaoyu Shen
RALM
LRM
271
1
0
20 Oct 2025
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Heming Zou
Yixiu Mao
Yun Qu
Qi Wang
Xiangyang Ji
153
1
0
19 Oct 2025
What Limits Agentic Systems Efficiency?
S. Bian
Minghao Yan
Anand Jayarajan
Gennady Pekhimenko
Shivaram Venkataraman
LLMAG
LRM
121
0
0
18 Oct 2025
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Zhi Zhou
Yuhao Tan
Zenan Li
Yuan Yao
Lan-Zhe Guo
Yu-Feng Li
Xiaoxing Ma
LRM
94
0
0
17 Oct 2025
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
Guiyao Tie
Zenghui Yuan
Zeli Zhao
Chaoran Hu
Tianhe Gu
...
Ming Jin
Qingsong Wen
Lixing Chen
P. Zhou
Lichao Sun
KELM
ReLM
LRM
245
1
0
17 Oct 2025
Attention Sinks in Diffusion Language Models
Maximo Eduardo Rulli
Simone Petruzzi
Edoardo Michielon
Fabrizio Silvestri
Simone Scardapane
Alessio Devoto
64
1
0
17 Oct 2025
Learning to Answer from Correct Demonstrations
Nirmit Joshi
Gene Li
Siddharth Bhandari
Shiva Prasad Kasiviswanathan
Cong Ma
Nathan Srebro
OffRL
96
0
0
17 Oct 2025
An Experimental Study of Real-Life LLM-Proposed Performance Improvements
Lirong Yi
Gregory Gay
Philipp Leitner
68
0
0
17 Oct 2025
Helmsman: Autonomous Synthesis of Federated Learning Systems via Collaborative LLM Agents
Haoyuan Li
Mathias Funk
Aaqib Saeed
118
0
0
16 Oct 2025
Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification
Aofan Liu
Shiyuan Song
Haoxuan Li
Cehao Yang
Yiyan Qi
72
1
0
16 Oct 2025
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
Mehrzad Samadi
Aleksander Ficek
Sean Narenthiran
Siddhartha Jain
Wasi Uddin Ahmad
Somshubra Majumdar
Vahid Noroozi
Boris Ginsburg
LRM
76
0
0
16 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMat
AI4TS
LRM
279
0
0
16 Oct 2025
Programmatic Representation Learning with Language Models
Gabriel Poesia
Georgia Gabriela Sampaio
48
0
0
16 Oct 2025
LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet?
Bin Liu
Yanjie Zhao
Guoai Xu
Haoyu Wang
LLMAG
134
1
0
16 Oct 2025
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models
Guinan Su
Yanwu Yang
Li Shen
Lu Yin
Shiwei Liu
Jonas Geiping
MoE
KELM
156
1
0
16 Oct 2025
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Aayush Karan
Yilun Du
ReLM
OffRL
SyDa
AI4TS
LRM
220
5
0
16 Oct 2025
Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging
Bang An
Yibo Yang
Philip Torr
Bernard Ghanem
MoMe
158
0
0
16 Oct 2025
Attention Is All You Need for KV Cache in Diffusion LLMs
Quan Nguyen-Tri
Mukul Ranjan
Zhiqiang Shen
106
2
0
16 Oct 2025
RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
Zhichao Wang
Andy Wong
Ruslan Belkin
ALM
LRM
103
0
0
16 Oct 2025
Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval
Subhendu Khatuya
Shashwat Naidu
Pawan Goyal
Niloy Ganguly
AIMat
ReLM
LRM
231
0
0
15 Oct 2025
David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation
Philipp Bauerfeind
Amir Salarpour
David Fernandez
Pedram MohajerAnsari
Johannes Reschke
Mert D. Pesé
80
0
0
15 Oct 2025
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Xiaozhe Li
TianYi Lyu
Siyi Yang
Yuxi Gong
Yizhao Yang
Jinxuan Huang
Ligao Zhang
Zhuoyi Huang
Qingwen Liu
ELM
175
0
0
15 Oct 2025
Breaking Memorization Barriers in LLM Code Fine-Tuning via Information Bottleneck for Improved Generalization
Changsheng Wang
Xin Chen
Sijia Liu
Ke Ding
CLL
140
0
0
15 Oct 2025
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps
Ahmed Alzubaidi
Shaikha Alsuwaidi
Basma El Amel Boussaha
Leen AlQadi
Omar Alkaabi
Mohammed Alyafeai
Hamza Alobeidli
Hakim Hacid
ELM
142
1
0
15 Oct 2025
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
Yang Li
Z. Dong
Yuhan Sun
Weixun Wang
Shaopan Xiong
...
Han Lu
Jiamang Wang
Wenbo Su
Bo Zheng
Junchi Yan
LRM
99
2
0
15 Oct 2025
CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization
Henrique S. Assumpção
Diego Ferreira
Leandro Lacerda Campos
Fabricio Murai
94
0
0
15 Oct 2025
OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies
Peng Di
Faqiang Chen
X. Bai
Hongjun Yang
Qingfeng Li
...
Zhitao Shen
Zheng Li
Wenhui Shi
Junwei Guo
Hang Yu
140
0
0
15 Oct 2025
A Matter of Representation: Towards Graph-Based Abstract Code Generation
Nyx Iskandar
Hisham Bedri
Andy Tsen
108
0
0
15 Oct 2025
Training LLM Agents to Empower Humans
Evan Ellis
Vivek Myers
Jens Tuyls
Sergey Levine
Anca Dragan
Benjamin Eysenbach
166
0
0
15 Oct 2025
NOSA: Native and Offloadable Sparse Attention
Yuxiang Huang
Chaojun Xiao
Xu Han
Zhiyuan Liu
MQ
144
0
0
15 Oct 2025
Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation
Soohan Lim
Joonghyuk Hahn
Hyunwoo Park
Sang-Ki Ko
Yo-Sub Han
ALM
177
0
0
14 Oct 2025
A Survey on Parallel Reasoning
Z. Wang
Boye Niu
Zipeng Gao
Zhi Zheng
Tong Xu
...
Yilong Chen
Chen Zhu
Hua Wu
Haifeng Wang
Enhong Chen
ReLM
LRM
149
2
0
14 Oct 2025
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
Marco Del Tredici
Jacob McCarran
Benjamin Breen
Javier Aspuru Mijares
Weichen Winston Yin
Jacob M. Taylor
Frank Koppens
Dirk Englund
Dirk Englund
LRM
220
0
0
14 Oct 2025
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
Yushu Zhao
Yubin Qin
Yang Wang
Xiaolong Yang
Huiming Han
Shaojun Wei
Yang Hu
Shouyi Yin
MoE
146
0
0
14 Oct 2025
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Hancheng Ye
Zhengqi Gao
Mingyuan Ma
Qinsi Wang
Yuzhe Fu
...
Yueqian Lin
Zhijian Liu
Jianyi Zhang
Danyang Zhuo
Yiran Chen
VLM
119
1
0
14 Oct 2025
Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?
Cedric Richter
Heike Wehrheim
76
0
0
14 Oct 2025
Previous
1
2
3
4
5
6
...
88
89
90
Next