Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.03374
Cited By
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 868 papers shown
Title
NL2KQL: From Natural Language to Kusto Query
Amir H. Abdi
Xinye Tang
Jeremias Eichelbaum
Mahan Das
Alex Klein
...
William Blum
Daniel L Mace
Tanvi Raja
Namrata Padmanabhan
Ye Xing
60
2
0
20 Jan 2025
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
Yaojie Hu
Qiang Zhou
Qihong Chen
Xiaopeng Li
Linbo Liu
Dejiao Zhang
Amit Kachroo
Talha Oz
Omer Tripp
66
4
0
20 Jan 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
116
14
0
20 Jan 2025
Aligning Instruction Tuning with Pre-training
Yiming Liang
Tianyu Zheng
Xinrun Du
Ge Zhang
J. Liu
...
Zhaoxiang Zhang
Wenhao Huang
Jiajun Zhang
Xiang Yue
Jiajun Zhang
86
1
0
16 Jan 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen
Manish Shetty
Gagan Somashekar
Minghua Ma
Yogesh L. Simmhan
Jonathan Mace
Chetan Bansal
Rujia Wang
Saravan Rajmohan
51
1
0
12 Jan 2025
RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot
Spandan Garg
Roshanak Zilouchian Moghaddam
Neel Sundaresan
59
9
0
10 Jan 2025
Planning-Driven Programming: A Large Language Model Programming Workflow
Chao Lei
Yanchuan Chang
N. Lipovetzky
Krista A. Ehinger
81
2
0
10 Jan 2025
Soup to go: mitigating forgetting during continual learning with model averaging
Anat Kleiman
Gintare Karolina Dziugaite
Jonathan Frankle
Sham Kakade
Mansheej Paul
MoMe
CLL
KELM
51
0
0
09 Jan 2025
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion
Zhaoyi Yan
Zhijie Sang
Y. Zhang
Yuhao Fu
Baoyi He
Qi Zhou
Yining Di
Chunlin Ji
Shengyu Zhang
Fei Wu
MoMe
LRM
60
1
0
06 Jan 2025
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Junqiao Wang
Zeng Zhang
Yangfan He
Yuyang Song
Tianyu Shi
...
Hengyuan Xu
Kunyu Wu
Guangwu Qian
Qiuwu Chen
Lewei He
38
11
0
03 Jan 2025
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
H. Zhang
Xiaoman Pan
Hongwei Wang
Kaixin Ma
W. Yu
Dong Yu
LLMAG
61
3
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
76
216
0
03 Jan 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
36
2
0
03 Jan 2025
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
OSLM
LRM
108
408
0
03 Jan 2025
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Hiroki Furuta
Yutaka Matsuo
Aleksandra Faust
Izzeddin Gur
CLL
87
13
0
03 Jan 2025
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Shanghaoran Quan
Jiaxi Yang
Bowen Yu
Bo Zheng
Dayiheng Liu
...
Zeyu Cui
Yang Fan
Y. Zhang
Binyuan Hui
Junyang Lin
ALM
ELM
LRM
72
15
0
02 Jan 2025
Chain-of-Translation Prompting (CoTR): A Novel Prompting Technique for Low Resource Languages
Tejas Deshpande
Nidhi Kowtal
Raviraj Joshi
LRM
47
1
0
31 Dec 2024
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code
Batu Guan
Yao Wan
Zhangqian Bi
Zheng Wang
Hongyu Zhang
Yulei Sui
Pan Zhou
32
8
0
31 Dec 2024
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng
Yuzhen Huang
Lulu Zhao
Yijun Wang
Zifei Shan
Junxian He
LRM
35
7
0
23 Dec 2024
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Huawen Feng
Pu Zhao
Qingfeng Sun
Can Xu
Fangkai Yang
...
Qianli Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
AAML
ALM
62
0
0
23 Dec 2024
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Xiangxiang Gao
Weisheng Xie
Yiwei Xiang
Feng Ji
82
5
0
17 Dec 2024
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng
Xiao-Chang Liu
C. Wang
Xiaotao Gu
Y. Lu
Dan Zhang
Yuxiao Dong
J. Tang
Hongning Wang
Minlie Huang
LRM
123
3
0
16 Dec 2024
Empowering LLMs to Understand and Generate Complex Vector Graphics
Ximing Xing
Juncheng Hu
Guotao Liang
Jing Zhang
Dong Xu
Qian Yu
92
7
0
15 Dec 2024
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo
S. Kamath S
Leshem Choshen
Yuekai Sun
Mikhail Yurochkin
89
5
0
09 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
127
5
0
04 Dec 2024
Yi-Lightning Technical Report
01. AI
:
Alan Wake
Albert Wang
Bei Chen
...
Yuxuan Sha
Zhaodong Yan
Zhiyuan Liu
Zirui Zhang
Zonghong Dai
OSLM
102
3
0
02 Dec 2024
Are Large Language Models Memorizing Bug Benchmarks?
Daniel Ramos
Claudia Mamede
Kush Jain
Paulo Canelas
Catarina Gamboa
Claire Le Goues
PILM
ELM
94
6
0
20 Nov 2024
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri
Bartłomiej Cupiał
Samuel Coward
Ulyana Piterbarg
Maciej Wolczyk
...
Lerrel Pinto
Rob Fergus
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
LRM
108
10
0
20 Nov 2024
Human-In-the-Loop Software Development Agents
Wannita Takerngsaksiri
Jirat Pasuksmit
Patanamon Thongtanunam
C. Tantithamthavorn
Ruixiong Zhang
Fan Jiang
Jing Li
Evan Cook
K. Chen
Ming Wu
LLMAG
97
1
0
19 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
62
46
1
15 Nov 2024
PyGen: A Collaborative Human-AI Approach to Python Package Creation
Saikat Barua
Mostafizur Rahman
Md Jafor Sadek
Rafiul Islam
Shehnaz Khaled
Md. Shohrab Hossain
44
1
0
13 Nov 2024
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Fangyu Lei
Jixuan Chen
Yuxiao Ye
Ruisheng Cao
Dongchan Shin
...
Caiming Xiong
Ruoxi Sun
Qian Liu
Sida I. Wang
Tao Yu
LMTD
77
21
0
12 Nov 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRM
ReLM
ELM
AIMat
62
1
0
11 Nov 2024
Benchmarking Large Language Models with Integer Sequence Generation Tasks
Daniel O'Malley
Manish Bhattarai
Javier E. Santos
LRM
43
0
0
07 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
77
17
0
07 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
38
1
0
05 Nov 2024
MdEval: Massively Multilingual Code Debugging
Shukai Liu
Linzheng Chai
Jian Yang
Jiajun Shi
He Zhu
...
Yu Hao
Liqun Yang
Guanglin Niu
Ge Zhang
Z. Li
LRM
ELM
70
6
0
04 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Y. Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
49
3
0
04 Nov 2024
A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
QiHong Chen
Jiawei Li
Jiecheng Deng
Jiachen Yu
Justin Tian Jin Chen
Iftekhar Ahmed
56
0
0
03 Nov 2024
TabVer: Tabular Fact Verification with Natural Logic
Rami Aly
Andreas Vlachos
LMTD
28
0
0
02 Nov 2024
Defense Against Prompt Injection Attack by Leveraging Attack Techniques
Yulin Chen
Haoran Li
Zihao Zheng
Y. Song
Dekai Wu
Bryan Hooi
SILM
AAML
50
4
0
01 Nov 2024
Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models
Nirmal Joshua Kapu
Mihit Sreejith
35
0
0
30 Oct 2024
SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
Xiao Xia
Dan Zhang
Zibo Liao
Zhenyu Hou
Tianrui Sun
Jing Li
Ling Fu
Yuxiao Dong
AI4CE
LM&Ro
3DV
LLMAG
38
2
0
29 Oct 2024
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux
Çağlar Gülçehre
44
2
0
28 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
Process Supervision-Guided Policy Optimization for Code Generation
Ning Dai
Zheng Wu
Renjie Zheng
Ziyun Wei
Wenlei Shi
Xing Jin
Guanlin Liu
Chen Dun
Liang Huang
Lin Yan
54
7
0
23 Oct 2024
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
Samiha Sharlin
Tyler R. Josephson
ReLM
LLMAG
LRM
42
1
0
22 Oct 2024
Can Large Language Models Invent Algorithms to Improve Themselves?
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
AIFin
LRM
34
1
0
21 Oct 2024
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot Framework
Yuchen Wang
Shangxin Guo
C. Tan
32
0
0
20 Oct 2024
MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification
Yin Li
Liangwei Wang
Shiyuan Piao
Boo-Ho Yang
Ziyue Li
Wei Zeng
Fugee Tsung
31
0
0
19 Oct 2024
Previous
1
2
3
4
5
6
...
16
17
18
Next