Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.06161
Cited By
StarCoder: may the source be with you!
9 May 2023
Raymond Li
Loubna Ben Allal
Yangtian Zi
Niklas Muennighoff
Denis Kocetkov
Chenghao Mou
Marc Marone
Christopher Akiki
Jia Li
Jenny Chim
Qian Liu
Evgenii Zheltonozhskii
Terry Yue Zhuo
Thomas Wang
Olivier Dehaene
Mishig Davaadorj
J. Lamy-Poirier
João Monteiro
Oleh Shliazhko
Nicolas Angelard-Gontier
Nicholas Meade
A. Zebaze
Ming-Ho Yee
Logesh Kumar Umapathi
Jian Zhu
Benjamin Lipkin
Muhtasham Oblokulov
Zhiruo Wang
Rudra Murthy
Jason T Stillerman
S. Patel
Dmitry Abulkhanov
Marco Zocca
Manan Dey
Zhihan Zhang
N. Fahmy
Urvashi Bhattacharyya
W. Yu
Swayam Singh
Sasha Luccioni
Paulo Villegas
M. Kunakov
Fedor Zhdanov
Manuel Romero
Tony Lee
Nadav Timor
Jennifer Ding
Claire Schlesinger
Hailey Schoelkopf
Jana Ebert
Tri Dao
Mayank Mishra
A. Gu
Jennifer Robinson
Carolyn Jane Anderson
Brendan Dolan-Gavitt
Danish Contractor
Siva Reddy
Daniel Fried
Dzmitry Bahdanau
Yacine Jernite
Carlos Muñoz Ferrandis
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
Re-assign community
ArXiv
PDF
HTML
Papers citing
"StarCoder: may the source be with you!"
33 / 83 papers shown
Title
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Parshin Shojaee
Kazem Meidani
Shashank Gupta
A. Farimani
Chandan K. Reddy
37
13
0
29 Apr 2024
SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities
Kunato Nishina
Yusuke Matsui
27
7
0
21 Apr 2024
Semi-Instruct: Bridging Natural-Instruct and Self-Instruct for Code Large Language Models
Xianzhen Luo
Qingfu Zhu
Zhiming Zhang
Xu Wang
Qing Yang
Dongliang Xu
Wanxiang Che
ALM
19
2
0
01 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Zhen Guo
Adriana Meza Soria
Wei Sun
Yikang Shen
Rameswar Panda
ELM
ALM
37
1
0
14 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
112
347
0
09 Feb 2024
Text-to-Code Generation with Modality-relative Pre-training
Fenia Christopoulou
Guchun Zhang
Gerasimos Lampouras
AI4TS
11
1
0
08 Feb 2024
On the Standardization of Behavioral Use Clauses and Their Adoption for Responsible Licensing of AI
Daniel J. McDuff
Tim Korjakow
Scott Cambo
Jesse Josua Benjamin
Jenny Lee
...
Aaron Gokaslan
Alek Tarkowski
Joseph Lindley
A. F. Cooper
Danish Contractor
MedIm
18
7
0
07 Feb 2024
Temporal Blind Spots in Large Language Models
Jonas Wallat
Adam Jatowt
Avishek Anand
29
3
0
22 Jan 2024
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
22
61
0
19 Jan 2024
DebugBench: Evaluating Debugging Capability of Large Language Models
Runchu Tian
Yining Ye
Yujia Qin
Xin Cong
Yankai Lin
...
Yesai Wu
Haotian Hui
Weichuan Liu
Zhiyuan Liu
Maosong Sun
ELM
19
28
0
09 Jan 2024
CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset
Le Chen
Arijit Bhattacharjee
Nesreen K. Ahmed
N. Hasabnis
Gal Oren
Bin Lei
Ali Jannesari
LRM
16
3
0
11 Nov 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
13
20
0
16 Oct 2023
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
Hung Le
Hailin Chen
Amrita Saha
Akash Gokul
Doyen Sahoo
Shafiq R. Joty
LRM
23
41
0
13 Oct 2023
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
34
150
0
05 Sep 2023
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
33
20
0
03 Sep 2023
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
M. Wong
Shangxin Guo
Ching Nam Hang
Siu-Wai Ho
C. Tan
25
77
0
04 Jul 2023
Is Self-Repair a Silver Bullet for Code Generation?
Theo X. Olausson
J. Inala
Chenglong Wang
Jianfeng Gao
Armando Solar-Lezama
LRM
14
108
0
16 Jun 2023
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
Anastasiia Grishina
Max Hort
Leon Moonen
16
6
0
08 May 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
90
124
0
01 May 2023
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
Fengji Zhang
B. Chen
Yue Zhang
Jacky Keung
Jin Liu
Daoguang Zan
Yi Mao
Jian-Guang Lou
Weizhu Chen
25
218
0
22 Mar 2023
Data Portraits: Recording Foundation Model Training Data
Marc Marone
Benjamin Van Durme
129
30
0
06 Mar 2023
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models
Hossein Hajipour
Keno Hassler
Thorsten Holz
Lea Schonherr
Mario Fritz
ELM
22
19
0
08 Feb 2023
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
13
1
0
12 Dec 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
193
624
0
26 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
201
1,451
0
02 Sep 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
190
853
0
09 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
242
1,977
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
Previous
1
2