Beyond Text Compression: Evaluating Tokenizers Across ScalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Multilingual Pretraining for Pixel Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025 |