Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.03374
Cited By
Evaluating Large Language Models Trained on Code
7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Large Language Models Trained on Code"
50 / 856 papers shown
Title
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution
Zhi Chen
Wei Ma
Lingxiao Jiang
LLMAG
53
0
0
16 Mar 2025
Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models
Averi Bates
Ryan Vavricka
Shane Carleton
Ruosi Shao
Chongle Pan
59
0
0
15 Mar 2025
TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation
Mayank Kumar
J. Xue
Mengxin Zheng
Qian Lou
62
2
0
15 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
52
0
0
14 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
132
0
0
14 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
54
23
1
13 Mar 2025
Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning
Yuan Jiang
Yujian Zhang
Liang Lu
Christoph Treude
Xiaohong Su
Shan Huang
Tiantian Wang
ALM
61
0
0
12 Mar 2025
From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development -- An Opinion Paper
Sargam Yadav
Asifa Mehmood Qureshi
Abhishek Kaushik
Shubham Sharma
Roisin Loughran
...
. Nikhil Singh
Padraic O'Hara
Pranay Jaiswal
Roshan Chandru
David Lillis
56
1
0
10 Mar 2025
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models
Anastasiia Grishina
Vadim Liventsev
Aki Härmä
Leon Moonen
ELM
79
0
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
66
0
0
09 Mar 2025
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li
Xin Zhang
Zhongxin Guo
Shaoguang Mao
Wen Luo
Guangyue Peng
Yangyu Huang
Houfeng Wang
Scarlett Li
57
0
0
09 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
109
2
0
07 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPC
AI4CE
65
2
0
06 Mar 2025
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions
Julian Aron Prenner
Romain Robbes
59
0
0
06 Mar 2025
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
Peiding Wang
L. Zhang
Fang Liu
Lin Shi
Minxiao Li
Bo Shen
An Fu
ELM
LRM
131
0
0
05 Mar 2025
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach
Hetarth Chopra
Vidhi Rambhia
Vikram Adve
MoMe
65
0
0
05 Mar 2025
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging
Jie Wu
Haoling Li
Xin Zhang
Jianwen Luo
Yangyu Huang
Ruihang Chu
Y. Yang
Scarlett Li
73
0
0
04 Mar 2025
PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset
Haider Asif
Abdul Basit
Nouhaila Innan
Muhammad Kashif
Alberto Marchisio
Muhammad Shafique
Muhammad Shafique
72
1
0
04 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
117
5
0
03 Mar 2025
Cyber for AI at SemEval-2025 Task 4: Forgotten but Not Lost: The Balancing Act of Selective Unlearning in Large Language Models
Dinesh Srivasthav P
Bala Mallikarjunarao Garlapati
MU
44
0
0
02 Mar 2025
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv
Honglin Guo
Qipeng Guo
Xipeng Qiu
41
0
0
02 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
39
0
0
02 Mar 2025
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum
Y. Huang
Hongjian Zou
Qi Ding
Yixuan Liao
X. Chen
Qian Liu
Junxian He
64
2
0
02 Mar 2025
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology
Ludovico Mitchener
Jon M. Laurent
Benjamin Tenmann
Siddharth Narayanan
Geemi P Wellawatte
A. White
Lorenzo Sani
Samuel G. Rodriques
LLMAG
LM&MA
ELM
62
3
0
28 Feb 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
62
0
0
26 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
J. Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Z. Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
79
7
0
26 Feb 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
86
3
0
24 Feb 2025
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
Daniel Tan
Niels Warncke
Anna Sztyber-Betley
Xuchan Bao
Martín Soto
Nathan Labenz
Owain Evans
AAML
78
9
0
24 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
41
0
0
24 Feb 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
58
2
0
24 Feb 2025
R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning
Jinda Liu
Yi-Ju Chang
Yuan Wu
55
0
0
24 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
36
4
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
112
2
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
48
0
0
24 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
45
0
0
24 Feb 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Wei Wei
Chengfeng Gu
Yu-Xi Cheng
MoE
130
0
0
24 Feb 2025
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
Qiuhai Zeng
Claire Jin
Xinyue Wang
Yuhan Zheng
Qunhua Li
40
0
0
23 Feb 2025
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own
Gokul Puthumanaillam
Timothy Bretl
Melkior Ornik
39
0
0
23 Feb 2025
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light
Wei Cheng
Wu Yue
Masafumi Oyamada
Mengdi Wang
Santiago Paternain
Haifeng Chen
ReLM
LRM
58
2
0
23 Feb 2025
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Thanh Le-Cong
Bach Le
Toby Murray
LRM
47
1
0
22 Feb 2025
ARS: Automatic Routing Solver with Large Language Models
Kai Li
Fei Liu
Zhenkun Wang
Xialiang Tong
Xiongwei Han
Mingxuan Yuan
37
0
0
21 Feb 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Rui Ye
Xiaowen Dong
Y. Wang
Yanfeng Wang
S. Chen
SyDa
LLMAG
127
3
0
21 Feb 2025
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Seonil Son
Ju-Min Oh
Heegon Jin
Cheolhun Jang
Jeongbeom Jeong
Kuntae Kim
44
0
0
20 Feb 2025
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
Ruizhong Qiu
Weiliang Will Zeng
Hanghang Tong
James Ezick
Christopher Lott
88
15
0
20 Feb 2025
FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering
Yuan Sui
Yufei He
Nian Liu
Xiaoxin He
Kun Wang
Bryan Hooi
LRM
49
10
0
20 Feb 2025
DataSciBench: An LLM Agent Benchmark for Data Science
Dan Zhang
Sining Zhoubian
Min Cai
Fengzu Li
L. Yang
Wei Wang
Tianjiao Dong
Ziniu Hu
J. Tang
Yisong Yue
ALM
ELM
41
2
0
20 Feb 2025
InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua
Tyler Wong
Sun Fei
Liangming Pan
Adam Jardine
William Yang Wang
LRM
73
2
0
20 Feb 2025
Pragmatic Reasoning improves LLM Code Generation
Zhuchen Cao
Sven Apel
Adish Singla
Vera Demberg
LRM
37
0
0
20 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Z. Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
38
1
0
20 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
B. Wang
Weipeng Chen
VLM
198
2
0
20 Feb 2025
Previous
1
2
3
4
5
6
...
16
17
18
Next