Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.02406
Cited By
Efficient Online Data Mixing For Language Model Pre-Training
5 December 2023
Alon Albalak
Liangming Pan
Colin Raffel
W. Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Online Data Mixing For Language Model Pre-Training"
26 / 26 papers shown
Title
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
40
0
0
01 Apr 2025
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Thomson Yen
Andrew Siah
Haozhe Chen
Tianyi Peng
Daniel Guetta
Hongseok Namkoong
48
0
0
26 Mar 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
E. Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
S.
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
64
0
0
05 Mar 2025
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
76
0
0
27 Feb 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
G. Wang
Minyu Gao
Shuai Yang
Ya Zhang
Lizhi He
...
Yexuan Zhang
Wanyue Li
Lu Chen
Jintao Fei
Xin Li
62
1
0
25 Feb 2025
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Alon Albalak
Duy Phung
Nathan Lile
Rafael Rafailov
Kanishk Gandhi
...
Anikait Singh
Chase Blagden
Violet Xiang
Dakota Mahan
Nick Haber
OffRL
LRM
45
4
0
24 Feb 2025
Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
Lior Belenki
Alekh Agarwal
Tianze Shi
Kristina Toutanova
MoE
46
0
0
21 Feb 2025
VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Keer Lu
Keshi Zhao
Zheng Liang
Da Pan
Shusen Zhang
...
Weipeng Chen
Zenan Zhou
Guosheng Dong
Bin Cui
Wentao Zhang
VLM
26
0
0
18 Nov 2024
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
Clara Na
Ian H. Magnusson
A. Jha
Tom Sherborne
Emma Strubell
Jesse Dodge
Pradeep Dasigi
MoMe
36
4
0
21 Oct 2024
Balancing Label Quantity and Quality for Scalable Elicitation
Alex Troy Mallen
Nora Belrose
25
1
0
17 Oct 2024
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen
Philip Arthur
Qianyu Feng
Cong Duy Vu Hoang
Yu-Heng Hong
...
Mark Johnson
K. K.
Don Dharmasiri
Long Duong
Yuan-Fang Li
SyDa
46
1
0
16 Oct 2024
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
33
15
0
15 Oct 2024
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
40
4
0
09 Oct 2024
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li
Haoran Xu
Weiting Tan
Kenton Murray
Daniel Khashabi
35
1
0
06 Oct 2024
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Chi Zhang
Huaping Zhong
Kuan Zhang
Chengliang Chai
Rui Wang
...
Lei Cao
Ju Fan
Ye Yuan
Guoren Wang
Conghui He
TDI
28
4
0
25 Sep 2024
DiPT: Enhancing LLM reasoning through diversified perspective-taking
H. Just
Mahavir Dabas
Lifu Huang
Ming Jin
Ruoxi Jia
LRM
32
1
0
10 Sep 2024
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
Bo Liu
Aastha Jhunjhunwala
Zhilin Wang
M. Patwary
M. Shoeybi
Bryan Catanzaro
26
5
0
08 Jul 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
61
34
1
01 Jul 2024
DEM: Distribution Edited Model for Training with Mixed Data Distributions
Dhananjay Ram
Aditya Rawal
Momchil Hardalov
Nikolaos Pappas
Sheng Zha
MoMe
25
1
0
21 Jun 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
37
7
0
17 Jun 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
43
73
0
08 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
37
58
0
25 Mar 2024
Towards Optimal Learning of Language Models
Yuxian Gu
Li Dong
Y. Hao
Qingxiu Dong
Minlie Huang
Furu Wei
36
7
0
27 Feb 2024
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Axel Feldmann
A. Madry
OODD
10
45
0
23 Jan 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1