ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.21077
  4. Cited By
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
v1v2v3 (latest)

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models

29 July 2024
Somshubra Majumdar
Vahid Noroozi
Mehrzad Samadi
Sean Narenthiran
Aleksander Ficek
Wasi Uddin Ahmad
Jocelyn Huang
Jagadeesh Balam
Boris Ginsburg
    SyDa
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models"

26 / 26 papers shown
Title
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
Hengyuan Zhang
Shiping Yang
Xiao Liang
Chenming Shang
Yuxuan Jiang
...
Jing Xiong
Hayden Kwok-Hay So
Ruobing Xie
Angel X. Chang
Ngai Wong
128
0
0
13 Oct 2025
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
Xifeng Yao
Dongyu Lang
Wu Zhang
Xintong Guo
Huarui Xie
...
Ping Liu
Guang Shen
Yi Bai
Dandan Tu
Changzheng Zhang
84
0
0
16 Sep 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
227
11
0
05 Apr 2025
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Wasi Uddin Ahmad
Mehrzad Samadi
Somshubra Majumdar
Aleksander Ficek
Siddhartha Jain
Jocelyn Huang
Vahid Noroozi
Boris Ginsburg
LRM
368
36
0
02 Apr 2025
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source
  Instruction Data
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataInternational Conference on Learning Representations (ICLR), 2024
Shubham Toshniwal
Wei Du
Ivan Moshkov
Branislav Kisacanin
Alexan Ayrapetyan
Igor Gitman
LRM
344
115
0
02 Oct 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with
  Inverse-Instruct
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
171
4
0
08 Jul 2024
StarCoder 2 and The Stack v2: The Next Generation
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben Allal
Federico Cassano
J. Lamy-Poirier
...
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
OSLMELM
223
510
0
29 Feb 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
  Rise of Code Intelligence
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Yiming Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
344
1,281
0
25 Jan 2024
TACO: Topics in Algorithmic COde generation dataset
TACO: Topics in Algorithmic COde generation dataset
Rongao Li
Jie Fu
Bo Zhang
Tao Huang
Zhihong Sun
Chen Lyu
Guang Liu
Zhi Jin
Moe
272
83
0
22 Dec 2023
Rethinking Benchmark and Contamination for Language Models with
  Rephrased Samples
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
312
160
0
08 Nov 2023
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder: Self-Referential Self-Improvement Via Prompt EvolutionInternational Conference on Machine Learning (ICML), 2023
Chrisantha Fernando
Dylan Banarse
Henryk Michalewski
Simon Osindero
Tim Rocktaschel
LLMAGReLMLRM
262
320
0
28 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttentionSymposium on Operating Systems Principles (SOSP), 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
1.1K
3,933
0
12 Sep 2023
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
WizardCoder: Empowering Code Large Language Models with Evol-InstructInternational Conference on Learning Representations (ICLR), 2023
Ziyang Luo
Can Xu
Lu Wang
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
ELMSyDaALM
658
829
0
14 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationNeural Information Processing Systems (NeurIPS), 2023
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELMALM
1.0K
1,340
0
02 May 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Self-Instruct: Aligning Language Models with Self-Generated InstructionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALMSyDaLRM
697
2,750
0
20 Dec 2022
Unnatural Instructions: Tuning Language Models with (Almost) No Human
  Labor
Unnatural Instructions: Tuning Language Models with (Almost) No Human LaborAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
ALM
371
432
0
19 Dec 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
1.9K
16,931
0
04 Mar 2022
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELMAIMatReCodALM
354
2,755
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
1.0K
7,459
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELMAIMatALM
929
873
0
20 May 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.9K
51,164
0
28 May 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge
  Distillation
Making Monolingual Sentence Embeddings Multilingual using Knowledge DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Nils Reimers
Iryna Gurevych
393
1,177
0
21 Apr 2020
Self-training with Noisy Student improves ImageNet classification
Self-training with Noisy Student improves ImageNet classificationComputer Vision and Pattern Recognition (CVPR), 2019
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
1.0K
2,593
0
11 Nov 2019
The Curious Case of Neural Text Degeneration
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
392
3,681
0
22 Apr 2019
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Katharina Eggensperger
ODL
844
9,417
0
13 Aug 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic OptimizationInternational Conference on Learning Representations (ICLR), 2014
Diederik P. Kingma
Jimmy Ba
ODL
4.4K
160,277
0
22 Dec 2014
1