Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2407.21077
Cited By
v1
v2
v3 (latest)
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
29 July 2024
Somshubra Majumdar
Vahid Noroozi
Mehrzad Samadi
Sean Narenthiran
Aleksander Ficek
Wasi Uddin Ahmad
Jocelyn Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models"
26 / 26 papers shown
Title
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
Hengyuan Zhang
Shiping Yang
Xiao Liang
Chenming Shang
Yuxuan Jiang
...
Jing Xiong
Hayden Kwok-Hay So
Ruobing Xie
Angel X. Chang
Ngai Wong
128
0
0
13 Oct 2025
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
Xifeng Yao
Dongyu Lang
Wu Zhang
Xintong Guo
Huarui Xie
...
Ping Liu
Guang Shen
Yi Bai
Dandan Tu
Changzheng Zhang
84
0
0
16 Sep 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
227
11
0
05 Apr 2025
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Wasi Uddin Ahmad
Mehrzad Samadi
Somshubra Majumdar
Aleksander Ficek
Siddhartha Jain
Jocelyn Huang
Vahid Noroozi
Boris Ginsburg
LRM
368
36
0
02 Apr 2025
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
International Conference on Learning Representations (ICLR), 2024
Shubham Toshniwal
Wei Du
Ivan Moshkov
Branislav Kisacanin
Alexan Ayrapetyan
Igor Gitman
LRM
344
115
0
02 Oct 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
171
4
0
08 Jul 2024
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben Allal
Federico Cassano
J. Lamy-Poirier
...
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
OSLM
ELM
223
510
0
29 Feb 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Yiming Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
344
1,281
0
25 Jan 2024
TACO: Topics in Algorithmic COde generation dataset
Rongao Li
Jie Fu
Bo Zhang
Tao Huang
Zhihong Sun
Chen Lyu
Guang Liu
Zhi Jin
Moe
272
83
0
22 Dec 2023
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
312
160
0
08 Nov 2023
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
International Conference on Machine Learning (ICML), 2023
Chrisantha Fernando
Dylan Banarse
Henryk Michalewski
Simon Osindero
Tim Rocktaschel
LLMAG
ReLM
LRM
262
320
0
28 Sep 2023
Efficient Memory Management for Large Language Model Serving with PagedAttention
Symposium on Operating Systems Principles (SOSP), 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
1.1K
3,933
0
12 Sep 2023
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
International Conference on Learning Representations (ICLR), 2023
Ziyang Luo
Can Xu
Lu Wang
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
ELM
SyDa
ALM
658
829
0
14 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Neural Information Processing Systems (NeurIPS), 2023
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
1.0K
1,340
0
02 May 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALM
SyDa
LRM
697
2,750
0
20 Dec 2022
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
ALM
371
432
0
19 Dec 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.9K
16,931
0
04 Mar 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
354
2,755
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
1.0K
7,459
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELM
AIMat
ALM
929
873
0
20 May 2021
Language Models are Few-Shot Learners
Neural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.9K
51,164
0
28 May 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Nils Reimers
Iryna Gurevych
393
1,177
0
21 Apr 2020
Self-training with Noisy Student improves ImageNet classification
Computer Vision and Pattern Recognition (CVPR), 2019
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
1.0K
2,593
0
11 Nov 2019
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
392
3,681
0
22 Apr 2019
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Katharina Eggensperger
ODL
844
9,417
0
13 Aug 2016
Adam: A Method for Stochastic Optimization
International Conference on Learning Representations (ICLR), 2014
Diederik P. Kingma
Jimmy Ba
ODL
4.4K
160,277
0
22 Dec 2014
1