Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.13252
Cited By
Multilingual Language Model Pretraining using Machine-translated Data
20 February 2025
Jiayi Wang
Yao Lu
Maurice Weber
Max Ryabinin
David Ifeoluwa Adelani
Yihong Chen
Raphael Tang
Pontus Stenetorp
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multilingual Language Model Pretraining using Machine-translated Data"
2 / 2 papers shown
Title
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns
Letitia Parcalabescu
Stephan Wäldchen
Michael Barlow
Gregor Ziegltrum
Volker Stampa
Bastian Harren
Björn Deiseroth
SyDa
26
0
0
24 Apr 2025
Compass-V2 Technical Report
Sophia Maria
MoE
LRM
27
0
0
22 Apr 2025
1