v1v2 (latest)

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

3 May 2020

Papers citing "Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation"

32 / 32 papers shown

Lexically Grounded Subword SegmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Jindřich Libovický

Jindřich Helcl

336

19 Jun 2024

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

Hui Chen

339

27 Apr 2024

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

Khuyagbaatar Batsuren

Ekaterina Vylomova

Verna Dankers

Tsetsuukhei Delgerbaatar

Omri Uzan

Yuval Pinter

Gábor Bella

210

20 Apr 2024

Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text GenerationInternational Conference on Language Resources and Evaluation (LREC), 2024

Francois Meyer

Jan Buys

206

12 Mar 2024

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

305

02 Mar 2024

Tokenization Is More Than Compression

435

28 Feb 2024

Two Counterexamples to Tokenization and the Noiseless Channel

199

22 Feb 2024

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

250

08 Sep 2023

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Sadao Kurohashi

189

31 Jul 2023

Should you marginalize over possible tokenizations?Annual Meeting of the Association for Computational Linguistics (ACL), 2023

262

30 Jun 2023

Evolution of Efficient Symbolic Communication Codes

Anton Kolonin

146

04 Jun 2023

Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Francois Meyer

Jan Buys

242

11 May 2023

What changes when you randomly choose BPE merge operations? Not muchFirst Workshop on Insights from Negative Results in NLP (Insights), 2023

Jonne Saleva

Constantine Lignos

177

04 May 2023

Tokenization Preference for Human and Machine Learning Model: An Annotation Study

Tatsuya Hiraoka

Tomoya Iwakura

203

21 Apr 2023

Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing

Tatsuya Hiraoka

Tomoya Iwakura

156

21 Apr 2023

Elementwise Language Representation

Du-Yeong Kim

Jeeeun Kim

240

27 Feb 2023

Tokenization Consistency Matters for Generative Models on Extractive NLP TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

270

19 Dec 2022

Extending the Subwording Model of Multilingual Pretrained Models for New Languages

K. Imamura

Eiichiro Sumita

VLM

248

29 Nov 2022

Incorporating Context into Subword VocabulariesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Shaked Yehezkel

Yuval Pinter

257

13 Oct 2022

How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?Conference of the Association for Machine Translation in the Americas (AMTA), 2022

Ali Araabi

Christof Monz

Vlad Niculae

271

10 Aug 2022

The SIGMORPHON 2022 Shared Task on Morpheme SegmentationSpecial Interest Group on Computational Morphology and Phonology Workshop (SIGMORPHON), 2022

Khuyagbaatar Batsuren

...

Katevrina Pelegrinová

Fausto Giunchiglia

Robert Bamler

Ekaterina Vylomova

248

15 Jun 2022

Local Byte Fusion for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Makesh Narsimhan Sreedhar

Xiangpeng Wan

Yu-Jie Cheng

Junjie Hu

612

23 May 2022

Improving Tokenisation by Alternative Treatment of SpacesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Edward Gow-Smith

Harish Tayyar Madabushi

Carolina Scarton

Aline Villavicencio

279

08 Apr 2022

LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

161

28 Feb 2022

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

...

371

207

20 Dec 2021

You should evaluate your language model on marginal likelihood over tokenisationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Kris Cao

Laura Rimell

305

06 Sep 2021

Survey of Low-Resource Machine TranslationComputational Linguistics (CL), 2021

Barry Haddow

Rachel Bawden

Antonio Valerio Miceli Barone

Jindvrich Helcl

Alexandra Birch

AIMat

646

225

01 Sep 2021

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Zhen Qin

411

197

23 Jun 2021

How to Split: the Effect of Word Segmentation on Gender Bias in Speech TranslationFindings (Findings), 2021

Beatrice Savoldi

233

28 May 2021

Joint Optimization of Tokenization and Downstream ModelFindings (Findings), 2021

207

26 May 2021

Multi-view Subword RegularizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Xinyi Wang

Sebastian Ruder

Graham Neubig

314

15 Mar 2021

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech RecognitionItalian National Conference on Sensors (INS), 2021

Ivan Medennikov

177

12 Mar 2021