Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.10474
Cited By
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
25 January 2022
Suchin Gururangan
Dallas Card
Sarah K. Drier
E. K. Gade
Leroy Z. Wang
Zeyu Wang
Luke Zettlemoyer
Noah A. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection"
1 / 1 papers shown
Title
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
220
1,508
0
31 Dec 2020
1