Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.01204
Cited By
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis
1 April 2024
Chen Yang
Junzhuo Li
Xinyao Niu
Xinrun Du
Songyang Gao
Haoran Zhang
Zhaoliang Chen
Xingwei Qu
Ruibin Yuan
Yizhi Li
Jiaheng Liu
Stephen W. Huang
Shawn Yue
Wenhu Chen
Jie Fu
Ge Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis"
4 / 4 papers shown
Title
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
55
34
1
01 Jul 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
133
298
0
05 Jan 2024
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1