Lexically Grounded Subword SegmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Jindřich Libovický Jindřich Helcl |
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource
Agglutinative Data-to-Text GenerationInternational Conference on Language Resources and Evaluation (LREC), 2024 |
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement
LearningNeural Information Processing Systems (NeurIPS), 2023 |
Should you marginalize over possible tokenizations?Annual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Subword Segmental Machine Translation: Unifying Segmentation and Target
Sentence GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
What changes when you randomly choose BPE merge operations? Not muchFirst Workshop on Insights from Negative Results in NLP (Insights), 2023 |
Tokenization Consistency Matters for Generative Models on Extractive NLP
TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
Incorporating Context into Subword VocabulariesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022 |
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in
Neural Machine Translation?Conference of the Association for Machine Translation in the Americas (AMTA), 2022 |
The SIGMORPHON 2022 Shared Task on Morpheme SegmentationSpecial Interest Group on Computational Morphology and Phonology Workshop (SIGMORPHON), 2022 |
Local Byte Fusion for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
Improving Tokenisation by Alternative Treatment of SpacesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
You should evaluate your language model on marginal likelihood over
tokenisationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
Survey of Low-Resource Machine TranslationComputational Linguistics (CL), 2021 |
How to Split: the Effect of Word Segmentation on Gender Bias in Speech
TranslationFindings (Findings), 2021 |
Joint Optimization of Tokenization and Downstream ModelFindings (Findings), 2021 |
Multi-view Subword RegularizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021 |
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech RecognitionItalian National Conference on Sensors (INS), 2021 |