Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07384
Cited By
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
12 March 2024
Yu Yang
Siddhartha Mishra
Jeffrey N Chiang
Baharan Mirzasoleiman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models"
7 / 7 papers shown
Title
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Madhav Kotecha
Vijendra Kumar Vaishya
Smita Gautam
Suraj Racha
12
0
0
02 May 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
111
3
0
06 Feb 2025
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
45
34
1
01 Jul 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
22
8
0
16 Jun 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty
D. Sivasubramanian
Ganesh Ramakrishnan
A. De
Rishabh K. Iyer
OOD
75
184
0
27 Feb 2021
1