ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.08621
  4. Cited By
How Much Does Tokenization Affect Neural Machine Translation?
v1v2v3v4 (latest)

How Much Does Tokenization Affect Neural Machine Translation?

20 December 2018
Miguel Domingo
Mercedes García-Martínez
A. Helle
F. Casacuberta
Manuel Herranz
ArXiv (abs)PDFHTML

Papers citing "How Much Does Tokenization Affect Neural Machine Translation?"

29 / 29 papers shown
Doğal Dil İşlemede Tokenizasyon Standartları ve Ölçümü: Türkçe Üzerinden Büyük Dil Modellerinin Karşılaştırmalı Analizi
Doğal Dil İşlemede Tokenizasyon Standartları ve Ölçümü: Türkçe Üzerinden Büyük Dil Modellerinin Karşılaştırmalı AnaliziSignal Processing and Communications Applications Conference (SIU), 2025
M. Ali Bayram
Ali Arda Fincan
Ahmet Semih G"um"uş
Sercan Karakaş
Banu Diri
Savaş Yıldırım
99
1
0
18 Aug 2025
Experimental Evaluation of Dynamic Topic Modeling Algorithms
Experimental Evaluation of Dynamic Topic Modeling Algorithms
Ngozichukwuka Onah
Nadine Steinmetz
Hani Al-Sayeh
K. Sattler
161
0
0
01 Aug 2025
Tokenization Multiplicity Leads to Arbitrary Price Variation in LLM-as-a-service
Tokenization Multiplicity Leads to Arbitrary Price Variation in LLM-as-a-service
Ivi Chatzi
N. C. Benz
Stratis Tsirtsis
Manuel Gomez Rodriguez
196
1
0
06 Jun 2025
Beyond Text Compression: Evaluating Tokenizers Across Scales
Beyond Text Compression: Evaluating Tokenizers Across ScalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jonas F. Lotz
António V. Lopes
Stephan Peitz
Hendra Setiawan
Leonardo Emili
340
3
0
03 Jun 2025
Morphological Typology in BPE Subword Productivity and Language Modeling
Morphological Typology in BPE Subword Productivity and Language Modeling
Iñigo Parra
197
2
0
31 Oct 2024
Unsupervised Morphological Tree Tokenizer
Unsupervised Morphological Tree Tokenizer
Qingyang Zhu
Xiang Hu
Pengyu Ji
Wei Wu
Kewei Tu
351
0
0
21 Jun 2024
Revisiting subword tokenization: A case study on affixal negation in
  large language models
Revisiting subword tokenization: A case study on affixal negation in large language modelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Thinh Hung Truong
Yulia Otmakhova
Karin Verspoor
Trevor Cohn
Timothy Baldwin
299
4
0
03 Apr 2024
Pointer-Generator Networks for Low-Resource Machine Translation: Don't
  Copy That!
Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!
Niyati Bafna
Philipp Koehn
David Yarowsky
433
1
0
16 Mar 2024
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender
  Inclusive Language Technologies
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
Anaelia Ovalle
Ninareh Mehrabi
Palash Goyal
Jwala Dhamala
Kai-Wei Chang
Richard Zemel
Aram Galstyan
Yuval Pinter
Rahul Gupta
383
16
0
19 Dec 2023
Exploring the Impact of Training Data Distribution and Subword
  Tokenization on Gender Bias in Machine Translation
Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine TranslationInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Bar Iluz
Tomasz Limisiewicz
Gabriel Stanovsky
David Marevcek
271
7
0
21 Sep 2023
MorphPiece : A Linguistic Tokenizer for Large Language Models
MorphPiece : A Linguistic Tokenizer for Large Language Models
Jeffrey Hsu
248
9
0
14 Jul 2023
Tokenization with Factorized Subword Encoding
Tokenization with Factorized Subword EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
David Samuel
Lilja Øvrelid
246
3
0
13 Jun 2023
CompoundPiece: Evaluating and Improving Decompounding Performance of
  Language Models
CompoundPiece: Evaluating and Improving Decompounding Performance of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Benjamin Minixhofer
Jonas Pfeiffer
Ivan Vulić
325
11
0
23 May 2023
Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling
  Corrector
Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector
Derguene Mbaye
Moussa Diallo
155
4
0
15 May 2023
Low-Resourced Machine Translation for Senegalese Wolof Language
Low-Resourced Machine Translation for Senegalese Wolof Language
Derguene Mbaye
Moussa Diallo
T. Diop
203
5
0
01 May 2023
FLAME: A small language model for spreadsheet formulas
FLAME: A small language model for spreadsheet formulasAAAI Conference on Artificial Intelligence (AAAI), 2023
Harshit Joshi
Abishai Ebenezer
J. Cambronero
Sumit Gulwani
Aditya Kanade
Vu Le
Ivan Radivcek
Gust Verbruggen
LMTD
407
21
0
31 Jan 2023
Uncontrolled Lexical Exposure Leads to Overestimation of Compositional
  Generalization in Pretrained Models
Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models
Najoung Kim
Tal Linzen
P. Smolensky
292
33
0
21 Dec 2022
Improving Multilingual Neural Machine Translation System for Indic
  Languages
Improving Multilingual Neural Machine Translation System for Indic Languages
Sudhansu Bala Das
Atharv Biradar
Tapas Kumar Mishra
B. Patra
316
50
0
27 Sep 2022
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in
  Neural Machine Translation?
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?Conference of the Association for Machine Translation in the Americas (AMTA), 2022
Ali Araabi
Christof Monz
Vlad Niculae
268
14
0
10 Aug 2022
The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
The SIGMORPHON 2022 Shared Task on Morpheme SegmentationSpecial Interest Group on Computational Morphology and Phonology Workshop (SIGMORPHON), 2022
Khuyagbaatar Batsuren
Gábor Bella
Aryaman Arora
Viktor Martinović
Kyle Gorman
...
Magda vSevvcíková
Katevrina Pelegrinová
Fausto Giunchiglia
Robert Bamler
Ekaterina Vylomova
248
51
0
15 Jun 2022
How Robust is Neural Machine Translation to Language Imbalance in
  Multilingual Tokenizer Training?
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?Conference of the Association for Machine Translation in the Americas (AMTA), 2022
Shiyue Zhang
Vishrav Chaudhary
Naman Goyal
James Cross
Guillaume Wenzek
Joey Tianyi Zhou
Francisco Guzman
292
23
0
29 Apr 2022
Between words and characters: A Brief History of Open-Vocabulary
  Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
359
205
0
20 Dec 2021
Joint Optimization of Tokenization and Downstream Model
Joint Optimization of Tokenization and Downstream ModelFindings (Findings), 2021
Tatsuya Hiraoka
Sho Takase
Kei Uchiumi
Atsushi Keyaki
Naoaki Okazaki
206
19
0
26 May 2021
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
538
347
0
31 Dec 2020
SubICap: Towards Subword-informed Image Captioning
SubICap: Towards Subword-informed Image CaptioningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Naeha Sharif
Bennamoun
Wei Liu
Syed Afaq Ali Shah
167
2
0
24 Dec 2020
Towards Machine Translation for the Kurdish Language
Towards Machine Translation for the Kurdish Language
Sina Ahmadi
Mariam Masoud
233
13
0
12 Oct 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model PretrainingFindings (Findings), 2020
Kaj Bostrom
Greg Durrett
311
287
0
07 Apr 2020
Urdu-English Machine Transliteration using Neural Networks
Urdu-English Machine Transliteration using Neural Networks
Usman Mohy ud Din
112
2
0
12 Jan 2020
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and SurveyJournal of Artificial Intelligence Research (JAIR), 2019
Felix Stahlberg
3DVAI4TSMedIm
484
400
0
04 Dec 2019
1
Page 1 of 1