ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.00005
  4. Cited By
Mastering the Craft of Data Synthesis for CodeLLMs
v1v2v3 (latest)

Mastering the Craft of Data Synthesis for CodeLLMs

North American Chapter of the Association for Computational Linguistics (NAACL), 2024
16 October 2024
Meng Chen
Philip Arthur
Qianyu Feng
Cong Duy Vu Hoang
Yu-Heng Hong
Mahdi Kazemi Moghaddam
Omid Nezami
Tien N Nguyen
Gioacchino Tangari
Duy Vu
Thanh Tien Vu
Mark Johnson
Kemal Kurniawan
Don Dharmasiri
Long Duong
Yuan-Fang Li
    SyDa
ArXiv (abs)PDFHTML

Papers citing "Mastering the Craft of Data Synthesis for CodeLLMs"

14 / 64 papers shown
Large Language Models Meet NL2Code: A Survey
Large Language Models Meet NL2Code: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Daoguang Zan
B. Chen
Fengji Zhang
Di Lu
Bingchao Wu
Bei Guan
Yongji Wang
Jian-Guang Lou
ELMALM
237
236
0
19 Dec 2022
CodeExp: Explanatory Code Document Generation
CodeExp: Explanatory Code Document GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Haotian Cui
Chenglong Wang
Junjie Huang
J. Inala
Todd Mytkowicz
Bolong Wang
Jian Gao
Nan Duan
158
6
0
25 Nov 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code
  Generation
DS-1000: A Natural and Reliable Benchmark for Data Science Code GenerationInternational Conference on Machine Learning (ICML), 2022
Yuhang Lai
Chengxi Li
Yiming Wang
Tianyi Zhang
Ruiqi Zhong
Luke Zettlemoyer
Scott Yih
Daniel Fried
Si-yi Wang
Tao Yu
ELMALM
275
443
0
18 Nov 2022
Language Models Can Teach Themselves to Program Better
Language Models Can Teach Themselves to Program BetterInternational Conference on Learning Representations (ICLR), 2022
Patrick M. Haluptzok
Matthew Bowers
Adam Tauman Kalai
ReLMSyDaLRM
362
98
0
29 Jul 2022
Compilable Neural Code Generation with Compiler Feedback
Compilable Neural Code Generation with Compiler FeedbackFindings (Findings), 2022
Xin Wang
Yasheng Wang
Yao Wan
Fei Mi
Yitong Li
Pingyi Zhou
Jin Liu
Hao Wu
Xin Jiang
Qun Liu
209
86
0
10 Mar 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
1.1K
6,810
0
27 Oct 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELMAIMatReCodALM
419
2,869
0
16 Aug 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
717
770
0
14 Jul 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
2.1K
7,722
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELMAIMatALM
1.2K
910
0
20 May 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
910
2,546
0
31 Dec 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
4.0K
27,917
0
26 Jul 2019
A Survey of Machine Learning for Big Code and Naturalness
A Survey of Machine Learning for Big Code and Naturalness
Miltiadis Allamanis
Earl T. Barr
Premkumar T. Devanbu
Charles Sutton
415
941
0
18 Sep 2017
Bag of Tricks for Efficient Text Classification
Bag of Tricks for Efficient Text ClassificationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2016
Armand Joulin
Edouard Grave
Piotr Bojanowski
Tomas Mikolov
VLM
1.2K
4,888
0
06 Jul 2016
Previous
12