Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.11280
Cited By
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
22 July 2022
Fenia Christopoulou
Gerasimos Lampouras
Milan Gritta
Guchun Zhang
Yinpeng Guo
Zhong-yi Li
Qi Zhang
M. Xiao
Bo Shen
Lin Li
Hao Yu
Li-yu Yan
Pingyi Zhou
Xin Wang
Yuchi Ma
Ignacio Iacobacci
Yasheng Wang
Guangtai Liang
Jia Wei
Xin Jiang
Qianxiang Wang
Qun Liu
ELM
SyDa
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PanGu-Coder: Program Synthesis with Function-Level Language Modeling"
50 / 51 papers shown
Title
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Mihai Nadas
Laura Diosan
Andreea Tomescu
SyDa
67
0
0
18 Mar 2025
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
Xin Zhou
M. Weyssow
Ratnadira Widyasari
Ting Zhang
Junda He
Yunbo Lyu
Jianming Chang
Beiqi Zhang
Dan Huang
David Lo
PILM
166
1
0
10 Feb 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
114
13
0
20 Jan 2025
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Tianyu Zheng
Ge Zhang
Tianhao Shen
Xueling Liu
Bill Yuchen Lin
Jie Fu
Wenhu Chen
Xiang Yue
SyDa
76
102
0
08 Jan 2025
FDI: Attack Neural Code Generation Systems through User Feedback Channel
Zhensu Sun
Xiaoning Du
Xiapu Luo
Fu Song
David Lo
Li Li
AAML
23
3
0
08 Aug 2024
A Performance Study of LLM-Generated Code on Leetcode
Tristan Coignion
Clément Quinton
Romain Rouvoy
35
25
0
31 Jul 2024
CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
Yang Zhao
Di Huang
Chongxiao Li
Pengwei Jin
Muxin Song
...
Rui Zhang
Xingui Hu
Yunji Chen
Qi Guo
Xing Hu
67
22
0
15 Jul 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
18
1
0
08 Jul 2024
Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing
Anushrut Jignasu
Kelly O. Marshall
Ankush Kumar Mishra
Lucas Nerone Rillo
Baskar Ganapathysubramanian
Aditya Balu
Chinmay Hegde
Adarsh Krishnamurthy
30
0
0
04 Jul 2024
Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs
Ahmad Mohsin
Helge Janicke
Adrian Wood
Iqbal H. Sarker
Leandros A. Maglaras
N. Janjua
28
8
0
18 Jun 2024
A Survey on Large Language Models for Code Generation
Juyong Jiang
Fan Wang
Jiasi Shen
Sungju Kim
Sunghun Kim
40
158
0
01 Jun 2024
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Yinghui Li
Min Li
Wanxiang Che
Philip S. Yu
ALM
LM&MA
ELM
LRM
38
44
0
21 May 2024
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
Daoguang Zan
Ailun Yu
Wei Liu
Dong Chen
Bo Shen
...
Bei Guan
Zhiguang Yang
Yongji Wang
Qianxiang Wang
Li-zhen Cui
20
14
0
25 Mar 2024
Bugs in Large Language Models Generated Code: An Empirical Study
Florian Tambon
Arghavan Moradi Dakhel
Amin Nikanjam
Foutse Khomh
Michel C. Desmarais
G. Antoniol
ELM
29
33
0
13 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
REPOFUSE: Repository-Level Code Completion with Fused Dual Context
Ming Liang
Xiaoheng Xie
Gehao Zhang
Xunjin Zheng
Peng Di
Wei Jiang
Hongwei Chen
Chengpeng Wang
Gang Fan
24
14
0
22 Feb 2024
RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian
Adrian Cosma
Ioan-Bogdan Iordache
Paolo Rosso
OffRL
22
2
0
20 Feb 2024
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
Demin Song
Honglin Guo
Yunhua Zhou
Shuhao Xing
Yudong Wang
...
Wenwei Zhang
Qipeng Guo
Hang Yan
Xipeng Qiu
Dahua Lin
SyDa
50
6
0
20 Feb 2024
Text-to-Code Generation with Modality-relative Pre-training
Fenia Christopoulou
Guchun Zhang
Gerasimos Lampouras
AI4TS
13
1
0
08 Feb 2024
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
Dong Huang
Yuhao Qing
Weiyi Shang
Heming Cui
Jie M. Zhang
77
30
0
03 Feb 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Shihan Dou
Yan Liu
Haoxiang Jia
Limao Xiong
Enyu Zhou
...
Tao Ji
Rui Zheng
Qi Zhang
Xuanjing Huang
Tao Gui
LLMAG
54
28
0
02 Feb 2024
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Yao Wan
Yang He
Zhangqian Bi
Jianguo Zhang
Hongyu Zhang
Yulei Sui
Guandong Xu
Hai Jin
Philip S. Yu
20
20
0
30 Dec 2023
Refactoring Programs Using Large Language Models with Few-Shot Examples
Atsushi Shirafuji
Yusuke Oda
Jun Suzuki
Makoto Morishita
Yutaka Watanobe
19
33
0
20 Nov 2023
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
Xiangru Tang
Yuliang Liu
Zefan Cai
Yan Shao
Junjie Lu
...
Yujia Qin
Wangchunshu Zhou
Yilun Zhao
Arman Cohan
Mark B. Gerstein
ELM
LLMAG
30
17
0
16 Nov 2023
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis
P. Gorinski
Matthieu Zimmer
Gerasimos Lampouras
Derrick-Goh-Xin Deik
Ignacio Iacobacci
ALM
OffRL
30
3
0
20 Oct 2023
LLM for SoC Security: A Paradigm Shift
Dipayan Saha
Shams Tarek
Katayoon Yahyaei
S. Saha
Jingbo Zhou
M. Tehranipoor
Farimah Farahmandi
54
46
0
09 Oct 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
Xiaogang Jia
Yue Liu
Yue Yu
Yuanliang Zhang
Yu Jiang
Changjian Wang
Shanshan Li
LRM
SyDa
13
57
0
28 Sep 2023
OpenAi's GPT4 as coding assistant
Lefteris Moussiades
G. Zografos
LM&MA
ELM
19
1
0
22 Sep 2023
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models
Fernanda De La Torre
Cathy Mengying Fang
Han Huang
Andrzej Banburski-Fahey
Judith Amores Fernandez
Jaron Lanier
33
45
0
21 Sep 2023
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding
André Storhaug
Jingyue Li
Tianyuan Hu
AAML
21
14
0
18 Sep 2023
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
Lingyue Fu
Huacan Chai
Shuang Luo
Kounianhua Du
Weiming Zhang
...
Jingkuan Wang
Siyuan Qi
Kangning Zhang
Weinan Zhang
Yong Yu
ELM
11
9
0
05 Sep 2023
BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
Xiangru Tang
Bill Qian
Rick Gao
Jiakang Chen
Xinyun Chen
Mark B. Gerstein
16
10
0
31 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
55
116
0
14 Aug 2023
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
Xueying Du
Mingwei Liu
Kaixin Wang
Hanlin Wang
Junwei Liu
Yixuan Chen
Jiayi Feng
Chaofeng Sha
Xin Peng
Yiling Lou
ELM
ALM
24
137
0
03 Aug 2023
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
Bo Shen
Jiaxin Zhang
Taihong Chen
Daoguang Zan
Bing Geng
...
Ailun Yu
Jichuan Ji
Jingyang Zhao
Yuenan Guo
Qianxiang Wang
ALM
ELM
25
73
0
27 Jul 2023
RLTF: Reinforcement Learning from Unit Test Feedback
Jiate Liu
Yiqin Zhu
Kaiwen Xiao
Qiang Fu
Xiao Han
Wei Yang
Deheng Ye
OffRL
42
53
0
10 Jul 2023
Exploring the Robustness of Large Language Models for Solving Programming Problems
Atsushi Shirafuji
Yutaka Watanobe
Takumi Ito
Makoto Morishita
Yuki Nakamura
Yusuke Oda
Jun Suzuki
ELM
31
17
0
26 Jun 2023
Type Prediction With Program Decomposition and Fill-in-the-Type Training
Federico Cassano
Ming-Ho Yee
Noah Shinn
Arjun Guha
Steven Holtzen
24
5
0
25 May 2023
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Xiaozhe Ren
Pingyi Zhou
Xinfan Meng
Xinjing Huang
Yadao Wang
...
Jiansheng Wei
Xin Jiang
Teng Su
Qun Liu
Jun Yao
ALM
MoE
67
60
0
20 Mar 2023
Exploring Data Augmentation for Code Generation Tasks
Pinzhen Chen
Gerasimos Lampouras
13
9
0
05 Feb 2023
Measuring The Impact Of Programming Language Distribution
Gabriel Orlanski
Kefan Xiao
Xavier Garcia
Jeffrey Hui
Joshua Howland
J. Malmaud
Jacob Austin
Rishah Singh
Michele Catasta
20
19
0
03 Feb 2023
SantaCoder: don't reach for the stars!
Loubna Ben Allal
Raymond Li
Denis Kocetkov
Chenghao Mou
Christopher Akiki
...
Sean M. Hughes
Daniel Fried
Arjun Guha
H. D. Vries
Leandro von Werra
19
189
0
09 Jan 2023
Large Language Models Meet NL2Code: A Survey
Daoguang Zan
B. Chen
Fengji Zhang
Di Lu
Bingchao Wu
Bei Guan
Yongji Wang
Jian-Guang Lou
ELM
ALM
26
166
0
19 Dec 2022
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
15
1
0
12 Dec 2022
The Stack: 3 TB of permissively licensed source code
Denis Kocetkov
Raymond Li
Loubna Ben Allal
Jia Li
Chenghao Mou
...
Sean M. Hughes
Thomas Wolf
Dzmitry Bahdanau
Leandro von Werra
H. D. Vries
34
305
0
20 Nov 2022
Text and Code Embeddings by Contrastive Pre-Training
Arvind Neelakantan
Tao Xu
Raul Puri
Alec Radford
Jesse Michael Han
...
Tabarak Khan
Toki Sherbakov
Joanne Jang
Peter Welinder
Lilian Weng
SSL
AI4TS
204
420
0
24 Jan 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
210
1,485
0
02 Sep 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
623
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
196
853
0
09 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
1
2
Next