Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.00686
Cited By
A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages
1 March 2024
Catherine Arnett
Tyler A. Chang
Benjamin Bergen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages"
3 / 3 papers shown
Title
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
38
0
0
02 Apr 2025
Goldfish: Monolingual Language Models for 350 Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
36
4
0
19 Aug 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
12
0
15 Mar 2024
1