ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.07503
  4. Cited By
Best Practices and Lessons Learned on Synthetic Data for Language Models

Best Practices and Lessons Learned on Synthetic Data for Language Models

11 April 2024
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
Jinmeng Rao
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
    SyDa
    EgoV
ArXivPDFHTML

Papers citing "Best Practices and Lessons Learned on Synthetic Data for Language Models"

32 / 82 papers shown
Title
Abstraction-of-Thought Makes Language Models Better Reasoners
Abstraction-of-Thought Makes Language Models Better Reasoners
Ruixin Hong
Hongming Zhang
Xiaoman Pan
Dong Yu
Changshui Zhang
LRM
34
3
0
18 Jun 2024
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and
  Mitigation Strategies for Large Language Models
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models
Jie Chen
Yupeng Zhang
Bingning Wang
Wayne Xin Zhao
Ji-Rong Wen
Weipeng Chen
SyDa
27
4
0
18 Jun 2024
Preserving Knowledge in Large Language Model with Model-Agnostic
  Self-Decompression
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
Zilun Zhang
Yutao Sun
Tiancheng Zhao
Leigang Sha
Ruochen Xu
Kyusong Lee
Jianwei Yin
CLL
KELM
40
0
0
17 Jun 2024
GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
Darshan Deshpande
Shambhavi Sinha
Anirudh Ravi Kumar
Debaditya Pal
Jonathan May
AI4CE
31
0
0
16 Jun 2024
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A
  Survey
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Lin Long
Rui Wang
Ruixuan Xiao
Junbo Zhao
Xiao Ding
Gang Chen
Haobo Wang
SyDa
45
88
0
14 Jun 2024
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
  with Nothing
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Yuntian Deng
Radha Poovendran
Yejin Choi
Bill Yuchen Lin
SyDa
18
110
0
12 Jun 2024
Improving Text Generation on Images with Synthetic Captions
Improving Text Generation on Images with Synthetic Captions
Jun Young Koh
Sang Hyun Park
Joy Song
DiffM
41
2
0
01 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
  Multi-modal LLMs in Video Analysis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Tong Bill Xu
Xiawu Zheng
Enhong Chen
Rongrong Ji
Xing Sun
VLM
MLLM
34
216
0
31 May 2024
Self-Exploring Language Models: Active Preference Elicitation for Online
  Alignment
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Shenao Zhang
Donghan Yu
Hiteshi Sharma
Ziyi Yang
Shuohang Wang
Hany Hassan
Zhaoran Wang
LRM
23
28
0
29 May 2024
Automatically Generating Numerous Context-Driven SFT Data for LLMs
  across Diverse Granularity
Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity
Shanghaoran Quan
24
3
0
26 May 2024
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training
  Small Data Synthesis Models
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models
Kun Zhou
Beichen Zhang
Jiapeng Wang
Zhipeng Chen
Wayne Xin Zhao
Jing Sha
Zhichao Sheng
Shijin Wang
Ji-Rong Wen
SyDa
LRM
30
29
0
23 May 2024
Annotation-Efficient Preference Optimization for Language Model
  Alignment
Annotation-Efficient Preference Optimization for Language Model Alignment
Yuu Jinnai
Ukyo Honda
33
0
0
22 May 2024
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Jordi Bayarri-Planas
Adrián Tormos
Daniel Hinjos
...
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Sergio Álvarez Napagao
Eduard Ayguadé-Parra
Ulises Cortés Dario Garcia-Gasulla
ELM
LM&MA
24
9
0
03 May 2024
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Victor Carbune
Hassan Mansoor
Fangyu Liu
Rahul Aralikatte
Gilles Baechler
Jindong Chen
Abhanshu Sharma
ReLM
LRM
119
7
0
19 Mar 2024
Unlocking the conversion of Web Screenshots into HTML Code with the
  WebSight Dataset
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Hugo Laurençon
Léo Tronchon
Victor Sanh
VLM
47
13
0
14 Mar 2024
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task
  Adaptation
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Nihal V. Nayak
Yiyang Nan
Avi Trost
Stephen H. Bach
SyDa
25
5
0
28 Feb 2024
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Subbarao Kambhampati
Karthik Valmeekam
L. Guan
Mudit Verma
Kaya Stechly
Siddhant Bhambri
Lucas Saldyt
Anil Murthy
LRM
78
107
0
02 Feb 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
215
291
0
18 Jan 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
207
178
0
20 Oct 2023
Synthetic data, real errors: how (not) to publish and use synthetic data
Synthetic data, real errors: how (not) to publish and use synthetic data
B. V. Breugel
Zhaozhi Qian
M. Schaar
SyDa
44
28
0
16 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
204
1,701
0
07 Apr 2023
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
106
78
0
11 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
148
259
0
07 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep
  Reinforcement Learning
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
116
232
0
05 Jul 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Back-translation for Large-Scale Multilingual Machine Translation
Back-translation for Large-Scale Multilingual Machine Translation
Baohao Liao
Shahram Khadivi
Sanjika Hewavitharana
22
16
0
17 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question
  Answering
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
Arij Riabi
Thomas Scialom
Rachel Keraron
Benoît Sagot
Djamé Seddah
Jacopo Staiano
124
51
0
23 Oct 2020
Towards Faithful Neural Table-to-Text Generation with Content-Matching
  Constraints
Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints
Zhenyi Wang
Xiaoyang Wang
Bang An
Dong Yu
Changyou Chen
LMTD
146
84
0
03 May 2020
Previous
12