Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.03988
Cited By
SantaCoder: don't reach for the stars!
9 January 2023
Loubna Ben Allal
Raymond Li
Denis Kocetkov
Chenghao Mou
Christopher Akiki
Carlos Muñoz Ferrandis
Niklas Muennighoff
Mayank Mishra
A. Gu
Manan Dey
Logesh Kumar Umapathi
Carolyn Jane Anderson
Yangtian Zi
J. Lamy-Poirier
Hailey Schoelkopf
S. Troshin
Dmitry Abulkhanov
Manuel Romero
M. Lappert
F. Toni
Bernardo García del Río
Qian Liu
Shamik Bose
Urvashi Bhattacharyya
Terry Yue Zhuo
I. Yu
Paulo Villegas
Marco Zocca
Sourab Mangrulkar
D. Lansky
Huu Nguyen
Danish Contractor
Luisa Villa
Jia Li
Dzmitry Bahdanau
Yacine Jernite
Sean M. Hughes
Daniel Fried
Arjun Guha
H. D. Vries
Leandro von Werra
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SantaCoder: don't reach for the stars!"
24 / 24 papers shown
Title
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Md. Ashraful Islam
Mohammed Eunus Ali
Md. Rizwan Parvez
LLMAG
66
2
0
08 Feb 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
114
13
0
20 Jan 2025
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
75
17
0
07 Nov 2024
MdEval: Massively Multilingual Code Debugging
Shukai Liu
Linzheng Chai
Jian Yang
Jiajun Shi
He Zhu
...
Yu Hao
Liqun Yang
Guanglin Niu
Ge Zhang
Z. Li
LRM
ELM
70
6
0
04 Nov 2024
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
Nan Xu
Xuezhe Ma
LRM
36
3
0
18 Oct 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
65
128
0
22 Jun 2024
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Yinghui Li
Min Li
Wanxiang Che
Philip S. Yu
ALM
LM&MA
ELM
LRM
38
46
0
21 May 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
Text-to-Code Generation with Modality-relative Pre-training
Fenia Christopoulou
Guchun Zhang
Gerasimos Lampouras
AI4TS
13
1
0
08 Feb 2024
UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing
Yifeng He
Jiabo Huang
Yuyang Rong
Yiwen Guo
Ethan Wang
Hao Chen
19
4
0
04 Feb 2024
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
33
20
0
03 Sep 2023
Deduplicating and Ranking Solution Programs for Suggesting Reference Solutions
Atsushi Shirafuji
Yutaka Watanobe
19
1
0
16 Jul 2023
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
M. Wong
Shangxin Guo
Ching Nam Hang
Siu-Wai Ho
C. Tan
28
78
0
04 Jul 2023
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Erik Nijkamp
A. Ghobadzadeh
Caiming Xiong
Silvio Savarese
Yingbo Zhou
147
164
0
03 May 2023
Kartezio: Evolutionary Design of Explainable Pipelines for Biomedical Image Analysis
Kévin Cortacero
Brienne A. McKenzie
S. Muller
Roxana Khazen
Fanny Lafouresse
...
H. Luga
Oskar Staufer
Michael L. Dustin
S. Valitutti
Sylvain Cussat-Blanc
MedIm
11
15
0
28 Feb 2023
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Shuyan Zhou
Uri Alon
Sumit Agarwal
Graham Neubig
ELM
ALM
22
98
0
10 Feb 2023
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun
Sanjay Krishna Gouda
Zijian Wang
Xiaopeng Li
Yuchen Tian
...
Baishakhi Ray
Parminder Bhatia
Sudipta Sengupta
Dan Roth
Bing Xiang
ELM
112
160
0
26 Oct 2022
MTEB: Massive Text Embedding Benchmark
Niklas Muennighoff
Nouamane Tazi
L. Magne
Nils Reimers
21
369
0
13 Oct 2022
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
202
628
0
26 Feb 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
210
1,485
0
02 Sep 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
623
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
196
1,103
0
09 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,986
0
31 Dec 2020
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
234
0
31 Dec 2020
1