ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELM
    ALM
ArXivPDFHTML

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 883 papers shown
Title
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
Hasan Alp Caferoğlu
Özgür Ulusoy
56
12
0
25 Sep 2024
Automating Traffic Model Enhancement with AI Research Agent
Automating Traffic Model Enhancement with AI Research Agent
Xusen Guo
Xinxi Yang
Mingxing Peng
Hongliang Lu
Meixin Zhu
Hai Yang
62
0
0
25 Sep 2024
MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration
  with Large Language Models
MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models
Wenhao Yu
Jie Peng
Yueliang Ying
Sai Li
Jianmin Ji
Yanyong Zhang
50
4
0
24 Sep 2024
ChatGPT as a Solver and Grader of Programming Exams written in Spanish
ChatGPT as a Solver and Grader of Programming Exams written in Spanish
Pablo Fernández-Saborido
Marcos Fernández-Pichel
David E. Losada
ELM
40
0
0
23 Sep 2024
Co-occurrence is not Factual Association in Language Models
Co-occurrence is not Factual Association in Language Models
Xiao Zhang
Miao Li
Ji Wu
KELM
66
2
0
21 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
41
0
0
19 Sep 2024
AutoVerus: Automated Proof Generation for Rust Code
AutoVerus: Automated Proof Generation for Rust Code
Chenyuan Yang
Xuheng Li
Md Rakib Hossain Misu
Jianan Yao
Weidong Cui
...
Jacob R. Lorch
Shuai Lu
Fan Yang
Ziqiao Zhou
Shan Lu
27
7
0
19 Sep 2024
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts
Jenny T Liang
Melissa Lin
Nikitha Rao
Brad A. Myers
75
5
0
19 Sep 2024
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
Mingjie Liu
Yun-Da Tsai
Wenfei Zhou
Haoxing Ren
SyDa
3DV
45
6
0
19 Sep 2024
Enabling Real-Time Conversations with Minimal Training Costs
Enabling Real-Time Conversations with Minimal Training Costs
Wang Xu
Shuo Wang
Weilin Zhao
Xu Han
Yukun Yan
Yudi Zhang
Zhe Tao
Zhiyuan Liu
Wanxiang Che
19
4
0
18 Sep 2024
CORE-Bench: Fostering the Credibility of Published Research Through a
  Computational Reproducibility Agent Benchmark
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Zachary S. Siegel
Sayash Kapoor
Nitya Nagdir
Benedikt Stroebl
Arvind Narayanan
29
8
0
17 Sep 2024
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as
  Autonomous Programmer
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer
Anmol Gautam
Kishore Kumar
Adarsh Jha
Mukunda NS
Ishaan Bhola
59
1
0
17 Sep 2024
StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?
StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?
Guobin Shen
Dongcheng Zhao
Aorigele Bao
Xiang-Yu He
Yiting Dong
Yi Zeng
31
1
0
14 Sep 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Ruiqi Zhong
Heng Wang
Dan Klein
Jacob Steinhardt
35
6
0
13 Sep 2024
Can Large Language Models Unlock Novel Scientific Research Ideas?
Can Large Language Models Unlock Novel Scientific Research Ideas?
Sandeep Kumar
Tirthankar Ghosal
Vinayak Goyal
Asif Ekbal
ALM
LRM
AI4CE
31
10
0
10 Sep 2024
RAGent: Retrieval-based Access Control Policy Generation
RAGent: Retrieval-based Access Control Policy Generation
Sakuna Jayasundara
N. Arachchilage
Giovanni Russello
51
1
0
08 Sep 2024
How Does Code Pretraining Affect Language Model Task Performance?
How Does Code Pretraining Affect Language Model Task Performance?
Jackson Petty
Sjoerd van Steenkiste
Tal Linzen
60
8
0
06 Sep 2024
Evaluating the Performance of Large Language Models in Competitive
  Programming: A Multi-Year, Multi-Grade Analysis
Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis
Adrian Marius Dumitran
Adrian Catalin Badea
Stefan-Gabriel Muscalu
ELM
LRM
28
1
0
31 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip H. S. Torr
Mohamed Elhoseiny
Adel Bibi
74
9
0
27 Aug 2024
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
Jiace Zhu
Yingtao Shen
Jie Zhao
An Zou
LLMAG
LRM
27
4
0
25 Aug 2024
CortexCompile: Harnessing Cortical-Inspired Architectures for Enhanced
  Multi-Agent NLP Code Synthesis
CortexCompile: Harnessing Cortical-Inspired Architectures for Enhanced Multi-Agent NLP Code Synthesis
Gautham Ramachandran
Rick Yang
32
0
0
23 Aug 2024
Relational decomposition for program synthesis
Relational decomposition for program synthesis
Céline Hocquette
Andrew Cropper
39
4
0
22 Aug 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
31
22
0
20 Aug 2024
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Jiancheng Dong
Lei Jiang
Wei Jin
Lu Cheng
36
1
0
18 Aug 2024
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting
Emmanuel Aboah Boateng
Cassiano O. Becker
Nabiha Asghar
Kabir Walia
Ashwin Srinivasan
Ehi Nosakhare
Victor Dibia
Soundar Srinivasan
LRM
31
0
0
18 Aug 2024
CogLM: Tracking Cognitive Development of Large Language Models
CogLM: Tracking Cognitive Development of Large Language Models
Xinglin Wang
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
ELM
67
0
0
17 Aug 2024
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu
Jian Yang
Linzheng Chai
Ge Zhang
Jiaheng Liu
...
Xianfu Cheng
Tianzhen Sun
Guanglin Niu
Tongliang Li
Zhoujun Li
LMTD
ELM
65
19
0
17 Aug 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
X. Sun
53
12
0
13 Aug 2024
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Toufique Ahmed
Premkumar Devanbu
Christoph Treude
Michael Pradel
70
11
0
10 Aug 2024
Retrieval-augmented code completion for local projects using large
  language models
Retrieval-augmented code completion for local projects using large language models
Marko Hostnik
Marko Robnik-Sikonja
RALM
27
0
0
09 Aug 2024
TextIM: Part-aware Interactive Motion Synthesis from Text
TextIM: Part-aware Interactive Motion Synthesis from Text
Siyuan Fan
Bo Du
Xiantao Cai
Bo Peng
Longling Sun
DiffM
40
1
0
06 Aug 2024
LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning
LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning
Lekai Chen
Ashutosh Trivedi
Alvaro Velasquez
21
1
0
06 Aug 2024
CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs
CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs
Weijie Lv
Xuan Xia
Sheng-Jun Huang
ALM
34
2
0
05 Aug 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Haolin Jin
Linghan Huang
Haipeng Cai
Jun Yan
Bo Li
Huaming Chen
78
24
0
05 Aug 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
...
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Zenan Zhou
ELM
76
16
0
02 Aug 2024
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Mengkang Hu
Yixiao Wang
Can Xu
Lingfeng Sun
Chensheng Peng
T. Hannagan
Nicola Poerio
Saravan Rajmohan
LM&Ro
LLMAG
69
15
0
01 Aug 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
67
14
0
30 Jul 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
72
1
0
30 Jul 2024
Strong Copyright Protection for Language Models via Adaptive Model
  Fusion
Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
42
4
0
29 Jul 2024
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
Somshubra Majumdar
Vahid Noroozi
Sean Narenthiran
Aleksander Ficek
Aleksander Ficek
Wasi Uddin Ahmad
Jocelyn Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
54
2
0
29 Jul 2024
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
Zhengbo Wang
Jian Liang
Ran He
Zilei Wang
Tieniu Tan
50
15
0
25 Jul 2024
Compact Language Models via Pruning and Knowledge Distillation
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
34
37
0
19 Jul 2024
People use fast, goal-directed simulation to reason about novel games
People use fast, goal-directed simulation to reason about novel games
Cedegao E. Zhang
Katherine M. Collins
L. Wong
Adrian Weller
Adrian Weller
Joshua B. Tenenbaum
LRM
35
0
0
19 Jul 2024
CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
Yang Zhao
Di Huang
Chongxiao Li
Pengwei Jin
Muxin Song
...
Rui Zhang
Xingui Hu
Yunji Chen
Qi Guo
Xing Hu
69
22
0
15 Jul 2024
Benchmarking Language Model Creativity: A Case Study on Code Generation
Benchmarking Language Model Creativity: A Case Study on Code Generation
Yining Lu
Dixuan Wang
Tianjian Li
Dongwei Jiang
Daniel Khashabi
Meng Jiang
Daniel Khashabi
LRM
54
10
0
12 Jul 2024
Learning Program Behavioral Models from Synthesized Input-Output Pairs
Learning Program Behavioral Models from Synthesized Input-Output Pairs
Tural Mammadov
Dietrich Klakow
Alexander Koller
Andreas Zeller
39
3
0
11 Jul 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
73
9
0
09 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
66
4
0
08 Jul 2024
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu
Abhay Puri
Juan A. Rodriguez
Alexandre Drouin
Perouz Taslakian
...
Christopher Pal
Nicolas Chapados
I. Laradji
Sai Rajeswar Mudumba
Issam Hadj Laradji
ELM
46
4
0
08 Jul 2024
Lucy: Think and Reason to Solve Text-to-SQL
Lucy: Think and Reason to Solve Text-to-SQL
Nina Narodytska
S. Vargaftik
LMTD
ReLM
AI4TS
LRM
35
2
0
06 Jul 2024
Previous
123...678...161718
Next