Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.03374
Cited By
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 889 papers shown
Title
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul M. Chilimbi
29
0
0
13 May 2025
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions
K. Teranishi
Harshitha Menon
William F. Godoy
Prasanna Balaprakash
David Bau
...
Swaroop Pophale
Pedro Valero-Lara
Jeffrey S. Vetter
Samuel Williams
Aaron R. Young
28
0
0
13 May 2025
Small but Significant: On the Promise of Small Language Models for Accessible AIED
Yumou Wei
Paulo Carvalho
John Stamper
SyDa
40
0
0
13 May 2025
CAD-Coder:Text-Guided CAD Files Code Generation
Changqi He
Shuhan Zhang
Liguo Zhang
Jiajun Miao
29
0
0
13 May 2025
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
Aiyao He
Sijia Cui
Shuai Xu
Yanna Wang
Bo Xu
39
0
0
13 May 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng
A. Goel
Hakan Bilen
LRM
29
0
0
12 May 2025
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Kai Xu
YiWei Mao
XinYi Guan
ZiLong Feng
38
0
0
12 May 2025
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
Xiaotian Lin
Yanlin Qi
Yizhang Zhu
Themis Palpanas
Chengliang Chai
Nan Tang
Yuyu Luo
23
0
0
12 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
23
0
0
12 May 2025
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang
Yuchen Zhuang
Yinghao Li
D. Kilman
Rongzhi Zhang
...
Ian Shu-Hei Wong
Sherry Yang
Percy Liang
Chao Zhang
Bo Dai
ELM
39
0
0
12 May 2025
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
Yifeng Di
Tianyi Zhang
26
0
0
12 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Y. Li
Haojie Wang
Biao Hou
Jidong Zhai
38
0
0
12 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
55
0
0
10 May 2025
KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery
Yumou Wei
Paulo Carvalho
John Stamper
AI4Ed
53
1
0
09 May 2025
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
39
1
0
09 May 2025
Camera Control at the Edge with Language Models for Scene Understanding
Alexiy Buynitsky
Sina Ehsani
Bhanu Pallakonda
Pragyana Mishra
VLM
35
0
0
09 May 2025
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Zhun Wang
Vincent Siu
Zhe Ye
Tianneng Shi
Yuzhou Nie
Xuandong Zhao
Chenguang Wang
Wenbo Guo
Dawn Song
LLMAG
AAML
36
0
0
09 May 2025
PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization
Mohamed Salah Bouafif
Mohammad Hamdaqa
Edward Zulkoski
24
0
0
08 May 2025
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
31
0
0
08 May 2025
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
45
0
0
08 May 2025
YABLoCo: Yet Another Benchmark for Long Context Code Generation
Aidar Valeev
Roman Garaev
Vadim Lomshakov
Irina Piontkovskaya
Vladimir Ivanov
Israel Adewuyi
40
0
0
07 May 2025
An Empirical Study of OpenAI API Discussions on Stack Overflow
Xiang Chen
J. Wang
Chaoyang Gao
Xiaolin Ju
Zhanqi Cui
ELM
53
0
0
07 May 2025
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Y. Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
H. Li
LLMAG
45
0
0
06 May 2025
SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation
Yu-Ren Guo
Wen-Kai Tai
55
0
0
06 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
S. Song
Ce Zhang
James Y. Zou
ALM
31
0
0
05 May 2025
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
49
0
0
05 May 2025
AKD : Adversarial Knowledge Distillation For Large Language Models Alignment on Coding tasks
Ilyas Oulkadda
Julien Perez
ALM
42
0
0
05 May 2025
Incentivizing Inclusive Contributions in Model Sharing Markets
Enpei Zhang
Jingyi Chai
Rui Ye
Yanfeng Wang
Siheng Chen
TDI
FedML
136
0
0
05 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
50
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
57
0
0
04 May 2025
Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
M. Dearing
Yiheng Tao
Xingfu Wu
Z. Lan
V. Taylor
33
0
0
04 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
131
0
0
04 May 2025
QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach
Shouyang Dong
Yuanbo Wen
Jun Bi
Di Huang
Jiaming Guo
...
Yifan Hao
Xuehai Zhou
Tianshi Chen
Qi Guo
Yunji Chen
24
0
0
04 May 2025
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
Alice Rueda
Mohammed S. Hassan
Argyrios Perivolaris
Bazen G. Teferra
Reza Samavi
...
Y. Wu
Y. Zhang
Bo Cao
Divya Sharma
Sridhar Krishnan Venkat Bhat
ELM
LRM
58
0
0
02 May 2025
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
Bradley McDanel
S. Zhang
Y. Hu
Zining Liu
MoE
110
0
0
02 May 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Xing Hu
Zhixuan Chen
Dawei Yang
Zukang Xu
Chen Xu
Zhihang Yuan
Sifan Zhou
Jiangyong Yu
MoE
MQ
39
0
0
02 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
69
0
0
01 May 2025
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
D. Sculley
Will Cukierski
Phil Culliton
Sohier Dane
Maggie Demkin
...
Addison Howard
Paul Mooney
Walter Reade
Megan Risdal
Nate Keating
31
0
0
01 May 2025
LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs
Baleegh Ahmad
Hammond Pearce
Ramesh Karri
Benjamin Tan
34
0
0
30 Apr 2025
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
Sizhe Wang
Z. Wang
Dongsheng Ma
Yongan Yu
Rui Ling
Z. Li
Feiyu Xiong
W. Zhang
LRM
60
0
0
30 Apr 2025
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding
Xiuwei Shang
Zhenkan Fu
Shaoyin Cheng
Guoqiang Chen
Gangyang Li
Li Hu
W. Zhang
N. Yu
62
0
0
30 Apr 2025
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
Yunseo Lee
John Youngeun Song
Dongsun Kim
Jindae Kim
Mijung Kim
Jaechang Nam
HILM
LRM
37
0
0
29 Apr 2025
Turing Machine Evaluation for Large Language Model
Haitao Wu
Zongbo Han
Huaxi Huang
Changqing Zhang
ELM
LRM
62
0
0
29 Apr 2025
SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories
Connor Dilgren
Purva Chiniya
Luke Griffith
Yu Ding
Yizheng Chen
42
0
0
29 Apr 2025
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement
Manish Bhattarai
Miguel Cordova
Javier E. Santos
Dan O’Malley
39
0
0
29 Apr 2025
CrashFixer: A crash resolution agent for the Linux kernel
Alex Mathai
Chenxi Huang
Suwei Ma
Jihwan Kim
Hailie Mitchell
Aleksandr Nogikh
Petros Maniatis
Franjo Ivančić
Junfeng Yang
Baishakhi Ray
60
0
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
77
0
0
29 Apr 2025
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs
Paiheng Xu
Gang Wu
Xiang Chen
Tong Yu
Chang Xiao
Franck Dernoncourt
Tianyi Zhou
Wei Ai
Viswanathan Swaminathan
OffRL
52
0
0
29 Apr 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Y. Chen
Haoran Li
Yuan Sui
Y. Liu
Yufei He
Y. Song
Bryan Hooi
AAML
SILM
63
0
0
29 Apr 2025
OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification
Shangyu Li
Juyong Jiang
Tiancheng Zhao
Jiasi Shen
49
0
0
29 Apr 2025
1
2
3
4
...
16
17
18
Next