Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.05883
Cited By
Generative Deduplication For Socia Media Data Selection
11 January 2024
Xianming Li
Jing Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generative Deduplication For Socia Media Data Selection"
4 / 4 papers shown
Title
Automatic Document Selection for Efficient Encoder Pretraining
Yukun Feng
Patrick Xia
Benjamin Van Durme
João Sedoc
44
7
0
20 Oct 2022
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
24
44
0
07 Nov 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1