Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16358
Cited By
An Integrated Data Processing Framework for Pretraining Foundation Models
26 February 2024
Yiding Sun
Feng Wang
Yutao Zhu
Wayne Xin Zhao
Jiaxin Mao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Integrated Data Processing Framework for Pretraining Foundation Models"
3 / 3 papers shown
Title
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
242
1,070
0
05 Oct 2022
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1