
![]() Causal Estimation of Tokenisation BiasAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Optimized Text Embedding Models and Benchmarks for Amharic Passage RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Tokenization is Sensitive to Language VariationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
LLäMmlein: Transparent, Compact and Competitive German-Only Language Models from ScratchAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer
TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |