DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

16 February 2024

Ajay Patel

Chris Callison-Burch

Papers citing "DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows"

9 / 9 papers shown

Title
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding Siyang Jiang Bufang Yang Lilin Xu Mu Yuan Yeerzhati Abudunuer ... Liekang Zeng Hongkai Chen Zhenyu Yan Xiaofan Jiang Guoliang Xing VLM 51 0 0 03 May 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking Yi-Ling Chung Aurora Cobo Pablo Serna SyDa HILM 58 0 0 24 Feb 2025
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples Ajay Patel Jiacheng Zhu Justin Qiu Zachary Horvitz Marianna Apidianaki Kathleen McKeown Chris Callison-Burch 63 3 0 16 Oct 2024
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs Syeda Nahida Akter Shrimai Prabhumoye John Kamalu S. Satheesh Eric Nyberg M. Patwary M. Shoeybi Bryan Catanzaro LRM SyDa ReLM 98 1 0 15 Oct 2024
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective Zeyu Gan Yong Liu SyDa 39 1 0 02 Oct 2024
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes Lokesh Nagalapatti Chun-Liang Li Chih-Kuan Yeh Hootan Nakhost Yasuhisa Fujii Alexander Ratner Ranjay Krishna Chen-Yu Lee Tomas Pfister ALM 204 498 0 03 May 2023
Can Large Language Models Be an Alternative to Human Evaluations? Cheng-Han Chiang Hung-yi Lee ALM LM&MA 209 559 0 03 May 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,730 0 04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 205 1,651 0 15 Oct 2021