ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.10429
  4. Cited By
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

17 May 2023
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
    MoMe
    MoE
ArXivPDFHTML

Papers citing "DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining"

50 / 143 papers shown
Title
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
54
23
0
10 Sep 2024
Improving Pretraining Data Using Perplexity Correlations
Improving Pretraining Data Using Perplexity Correlations
Tristan Thrush
Christopher Potts
Tatsunori Hashimoto
32
17
0
09 Sep 2024
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and
  Deduplication by Introducing a Competitive Large Language Model Baseline
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Guosheng Dong
Da Pan
Yiding Sun
Shusen Zhang
Zheng Liang
...
Bingning Wang
Wentao Zhang
Jiaxin Mao
Zenan Zhou
Weipeng Chen
ALM
27
2
0
27 Aug 2024
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna
Chethan Bhateja
Yichen Jian
Karl Pertsch
Dorsa Sadigh
23
11
0
26 Aug 2024
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Qianqian Xie
Dong Li
Mengxi Xiao
Zihao Jiang
Ruoyu Xiang
...
Benyou Wang
Alejandro Lopez-Lira
Qianqian Xie
Sophia Ananiadou
Junichi Tsujii
AIFin
AI4TS
30
13
0
20 Aug 2024
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval
Guangyuan Ma
Yongliang Ma
Xing Wu
Zhenpeng Su
Ming Zhou
Songlin Hu
OOD
30
2
0
20 Aug 2024
The Data Addition Dilemma
The Data Addition Dilemma
Judy Hanwen Shen
Inioluwa Deborah Raji
Irene Y. Chen
24
5
0
08 Aug 2024
EXAONE 3.0 7.8B Instruction Tuned Language Model
EXAONE 3.0 7.8B Instruction Tuned Language Model
LG AI Research
:
Soyoung An
Kyunghoon Bae
Eunbi Choi
...
Boseong Seo
Sihoon Yang
Heuiyeen Yeen
Kyungjae Yoo
Hyeongu Yun
ELM
ALM
38
10
0
07 Aug 2024
Towards Effective and Efficient Continual Pre-training of Large Language
  Models
Towards Effective and Efficient Continual Pre-training of Large Language Models
Jie Chen
Zhipeng Chen
Jiapeng Wang
Kun Zhou
Yutao Zhu
...
Rui Yan
Zhewei Wei
Di Hu
Wenbing Huang
Ji-Rong Wen
KELM
ALM
CLL
ELM
LRM
35
4
0
26 Jul 2024
DDK: Distilling Domain Knowledge for Efficient Large Language Models
DDK: Distilling Domain Knowledge for Efficient Large Language Models
Jiaheng Liu
Chenchen Zhang
Jinyang Guo
Yuanxing Zhang
Haoran Que
...
Congnan Liu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
43
3
0
23 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
32
8
0
22 Jul 2024
MaskMoE: Boosting Token-Level Learning via Routing Mask in
  Mixture-of-Experts
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts
Zhenpeng Su
Zijia Lin
Xue Bai
Xing Wu
Yizhe Xiong
...
Guangyuan Ma
Hui Chen
Guiguang Ding
Wei Zhou
Songlin Hu
MoE
23
4
0
13 Jul 2024
Mitigating Catastrophic Forgetting in Language Transfer via Model
  Merging
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging
Anton Alexandrov
Veselin Raychev
Mark Niklas Muller
Ce Zhang
Martin Vechev
Kristina Toutanova
MoMe
CLL
KELM
25
13
0
11 Jul 2024
SoupLM: Model Integration in Large Language and Multi-Modal Models
SoupLM: Model Integration in Large Language and Multi-Modal Models
Yue Bai
Zichen Zhang
Jiasen Lu
Yun Fu
MoMe
22
1
0
11 Jul 2024
SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language
  Model Pre-training
SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training
Nan He
Weichen Xiong
Hanwen Liu
Yi Liao
Lei Ding
Kai Zhang
Guohua Tang
Xiao Han
Wei Yang
37
1
0
09 Jul 2024
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
Bo Liu
Aastha Jhunjhunwala
Zhilin Wang
M. Patwary
M. Shoeybi
Bryan Catanzaro
26
5
0
08 Jul 2024
LLMBox: A Comprehensive Library for Large Language Models
LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang
Yiwen Hu
Bingqian Li
Wenyang Luo
Zijing Qin
...
Chunxuan Xia
Junyi Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
26
1
0
08 Jul 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
55
34
1
01 Jul 2024
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Rui Pan
Jipeng Zhang
Xingyuan Pan
Renjie Pi
Xiaoyu Wang
Tong Zhang
45
5
0
28 Jun 2024
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual
  Pre-training
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Tong Zhu
Xiaoye Qu
Daize Dong
Jiacheng Ruan
Jingqi Tong
Conghui He
Yu Cheng
MoE
ALM
38
69
0
24 Jun 2024
DEM: Distribution Edited Model for Training with Mixed Data
  Distributions
DEM: Distribution Edited Model for Training with Mixed Data Distributions
Dhananjay Ram
Aditya Rawal
Momchil Hardalov
Nikolaos Pappas
Sheng Zha
MoMe
25
1
0
21 Jun 2024
Efficient Continual Pre-training by Mitigating the Stability Gap
Efficient Continual Pre-training by Mitigating the Stability Gap
Yiduo Guo
Jie Fu
Huishuai Zhang
Dongyan Zhao
Yikang Shen
30
12
0
21 Jun 2024
Instruction Pre-Training: Language Models are Supervised Multitask
  Learners
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Daixuan Cheng
Yuxian Gu
Shaohan Huang
Junyu Bi
Minlie Huang
Furu Wei
SyDa
51
20
0
20 Jun 2024
Data-Centric AI in the Age of Large Language Models
Data-Centric AI in the Age of Large Language Models
Xinyi Xu
Zhaoxuan Wu
Rui Qiao
Arun Verma
Yao Shu
...
Xiaoqiang Lin
Wenyang Hu
Zhongxiang Dai
Pang Wei Koh
Bryan Kian Hsiang Low
ALM
40
2
0
20 Jun 2024
Low-Redundant Optimization for Large Language Model Alignment
Low-Redundant Optimization for Large Language Model Alignment
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Jingyuan Wang
Ji-Rong Wen
29
2
0
18 Jun 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
37
7
0
17 Jun 2024
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Youyang Qu
Ming Liu
Tianqing Zhu
Longxiang Gao
Shui Yu
Wanlei Zhou
MU
FedML
52
2
0
14 Jun 2024
Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large
  Language Models
Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models
Minghao Wu
Thuy-Trang Vu
Lizhen Qu
Gholamreza Haffari
21
4
0
13 Jun 2024
Large Language Model-guided Document Selection
Large Language Model-guided Document Selection
Xiang Kong
Tom Gunter
Ruoming Pang
28
4
0
07 Jun 2024
Does your data spark joy? Performance gains from domain upsampling at
  the end of training
Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney
Mansheej Paul
Brett W. Larsen
Sean Owen
Jonathan Frankle
16
19
0
05 Jun 2024
Zyda: A 1.3T Dataset for Open Language Modeling
Zyda: A 1.3T Dataset for Open Language Modeling
Yury Tokpanov
Beren Millidge
Paolo Glorioso
Jonathan Pilault
Adam Ibrahim
James Whittington
Quentin Anthony
27
2
0
04 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
  Reference Models
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
30
23
0
30 May 2024
Group Robust Preference Optimization in Reward-free RLHF
Group Robust Preference Optimization in Reward-free RLHF
Shyam Sundhar Ramesh
Yifan Hu
Iason Chaimalas
Viraj Mehta
Pier Giuseppe Sessa
Haitham Bou-Ammar
Ilija Bogunovic
14
13
0
30 May 2024
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of
  Large Language Model
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
Chaochen Gao
Xing Wu
Qingfang Fu
Songlin Hu
SyDa
24
3
0
30 May 2024
Zamba: A Compact 7B SSM Hybrid Model
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
19
7
0
26 May 2024
gzip Predicts Data-dependent Scaling Laws
gzip Predicts Data-dependent Scaling Laws
Rohan Pandey
14
9
0
26 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
31
0
26 May 2024
360Zhinao Technical Report
360Zhinao Technical Report
360Zhinao Team
32
0
0
22 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage
  Pruning
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
32
0
0
09 May 2024
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective
  Scaffold Token Removal
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal
Haoran Lian
Yizhe Xiong
Jianwei Niu
Shasha Mo
Zhenpeng Su
Zijia Lin
Peng Liu
Hui Chen
Guiguang Ding
21
1
0
27 Apr 2024
Temporal Scaling Law for Large Language Models
Temporal Scaling Law for Large Language Models
Yizhe Xiong
Xiansheng Chen
Xin Ye
Hui Chen
Zijia Lin
Haoran Lian
Zhenpeng Su
Jianwei Niu
Guiguang Ding
30
9
0
27 Apr 2024
Nyonic Technical Report
Nyonic Technical Report
Junfeng Tian
Rui-cang Wang
Cong Li
Yudong Zhou
Jun Liu
Jun Wang
28
0
0
24 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
A. Kalyan
Karthik Narasimhan
A. Deshpande
Bruno Castro da Silva
21
33
0
12 Apr 2024
Rho-1: Not All Tokens Are What You Need
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
46
53
0
11 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
38
275
0
09 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
43
73
0
08 Apr 2024
Enhancing Reasoning Capacity of SLM using Cognitive Enhancement
Enhancing Reasoning Capacity of SLM using Cognitive Enhancement
Jonathan Pan
Swee Liang Wong
Xin Wei Chia
Yidi Yuan
LRM
30
0
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
33
31
0
31 Mar 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
37
58
0
25 Mar 2024
Unraveling the Mystery of Scaling Laws: Part I
Unraveling the Mystery of Scaling Laws: Part I
Hui Su
Zhi Tian
Xiaoyu Shen
Xunliang Cai
26
19
0
11 Mar 2024
Previous
123
Next