
Title |
|---|
![]() GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority LanguagesNeural Information Processing Systems (NeurIPS), 2024 |
![]() MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() data2lang2vec: Data Driven Typological Features CompletionInternational Conference on Computational Linguistics (COLING), 2024 |
![]() TransliCo: A Contrastive Learning Framework to Address the Script
Barrier in Multilingual Pretrained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() GlotLID: Language Identification for Low-Resource LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |