Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2305.14327
Cited By
v1
v2 (latest)
Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Da Yin
Xiao Liu
Fan Yin
Ming Zhong
Hritik Bansal
Jiawei Han
Kai-Wei Chang
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (64★)
Papers citing
"Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation"
21 / 21 papers shown
Title
A Survey on Efficient Large Language Model Training: From Data-centric Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junyu Luo
Bohan Wu
Xiao Luo
Zhiping Xiao
Yiqiao Jin
...
Nan Yin
Yifan Wang
Jingyang Yuan
Wei Ju
Ming Zhang
108
3
0
29 Oct 2025
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Y. Wang
Da Yin
Yuedong Cui
Ruichen Zheng
Zhiqian Li
...
Di Wu
X. Wu
Chenchen Ye
Yu Zhou
Kai-Wei Chang
LLMAG
76
1
0
16 Oct 2025
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
J. Lin
Zhongruo Wang
Kun Qian
Tian Wang
Arvind Srinivasan
...
Weiqi Zhang
Sujay Sanghavi
C. L. P. Chen
Hyokun Yun
Lihong Li
CLL
238
1
0
25 Sep 2025
TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mingyu Zheng
Zhifan Feng
Jia Wang
Lanrui Wang
Zheng Lin
Yang Hao
Weiping Wang
LMTD
169
1
0
10 Jun 2025
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
A. Lupidi
Carlos Gemmell
Nicola Cancedda
Jane Dwivedi-Yu
Jason Weston
Jakob Foerster
Roberta Raileanu
Maria Lomeli
SyDa
343
21
0
12 Sep 2024
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
Ingo Ziegler
Abdullatif Köksal
Desmond Elliott
Hinrich Schütze
172
9
0
03 Sep 2024
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions
AAAI Conference on Artificial Intelligence (AAAI), 2024
Matan Levi
Yair Alluouche
Daniel Ohayon
Anton Puzanov
196
12
0
17 Aug 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Yushun Zhang
Zhihang Lin
Haozhe Zhang
Tian Ding
Tian Ding
Ruoyu Sun
CLL
333
7
0
30 Jul 2024
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models
Ziche Liu
Rui Ke
Feng Jiang
Feng Jiang
Haizhou Li
258
8
0
20 Jun 2024
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Yuntian Deng
Radha Poovendran
Yejin Choi
Bill Yuchen Lin
SyDa
291
240
0
12 Jun 2024
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
Aman Rangapur
Kejian Shi
William Merrill
Aakanksha Naik
Shruti Singh
...
Luca Soldaini
Shannon Zejiang Shen
Doug Downey
Hannaneh Hajishirzi
Arman Cohan
367
21
0
10 Jun 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
306
15
0
31 Mar 2024
SMART: Submodular Data Mixture Strategy for Instruction Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Kowndinya Renduchintala
S. Bhatia
Ganesh Ramakrishnan
190
11
0
13 Mar 2024
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Ashok Urlana
Charaka Vinayak Kumar
Ajeet Kumar Singh
B. Garlapati
S. Chalamala
Rahul Mishra
336
18
0
22 Feb 2024
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
International Conference on Learning Representations (ICLR), 2023
Renze Lou
Kai Zhang
Jian Xie
Yuxuan Sun
Janice Ahn
Hanzi Xu
Yu Su
Wenpeng Yin
219
36
0
05 Dec 2023
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Siru Ouyang
Shuohang Wang
Yang Liu
Ming Zhong
Yizhu Jiao
Dan Iter
Reid Pryzant
Chenguang Zhu
Heng Ji
Jiawei Han
197
48
0
19 Oct 2023
Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration
Fanqi Wan
Xinting Huang
Tao Yang
Xiaojun Quan
Wei Bi
Shuming Shi
ALM
246
27
0
13 Oct 2023
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Chang Gao
Wenxuan Zhang
Guizhen Chen
Wai Lam
606
8
0
04 Oct 2023
DoG-Instruct: Towards Premium Instruction-Tuning Data via Text-Grounded Instruction Wrapping
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yongrui Chen
Haiyun Jiang
Xinting Huang
Shuming Shi
Guilin Qi
SyDa
172
13
0
11 Sep 2023
Harnessing the Power of David against Goliath: Exploring Instruction Data Generation without Using Closed-Source Models
Yue Wang
Xinrui Wang
Juntao Li
Jinxiong Chang
Qishen Zhang
Zhongyi Liu
Guannan Zhang
Min Zhang
ALM
83
8
0
24 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
367
97
0
12 Aug 2023
1