ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.06606
  4. Cited By
Dynamic Programming Encoding for Subword Segmentation in Neural Machine
  Translation
v1v2 (latest)

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
3 May 2020
Xuanli He
Gholamreza Haffari
Mohammad Norouzi
ArXiv (abs)PDFHTML

Papers citing "Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation"

32 / 32 papers shown
Lexically Grounded Subword Segmentation
Lexically Grounded Subword SegmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jindřich Libovický
Jindřich Helcl
336
11
0
19 Jun 2024
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective
  Scaffold Token Removal
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal
Haoran Lian
Yizhe Xiong
Jianwei Niu
Shasha Mo
Zhenpeng Su
Zijia Lin
Peng Liu
Hui Chen
Guiguang Ding
339
2
0
27 Apr 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV
  Generalization Challenge
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
210
16
0
20 Apr 2024
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource
  Agglutinative Data-to-Text Generation
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text GenerationInternational Conference on Language Resources and Evaluation (LREC), 2024
Francois Meyer
Jan Buys
206
6
0
12 Mar 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
305
28
0
02 Mar 2024
Tokenization Is More Than Compression
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
435
76
0
28 Feb 2024
Two Counterexamples to Tokenization and the Noiseless Channel
Two Counterexamples to Tokenization and the Noiseless Channel
Marco Cognetta
Vilém Zouhar
Sangwhan Moon
Naoaki Okazaki
199
7
0
22 Feb 2024
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement
  Learning
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
David Yunis
Justin Jung
Falcon Z. Dai
Matthew R. Walter
OffRL
250
2
0
08 Sep 2023
SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural
  Machine Translation
SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Israfel Salazar
Mary Dabre
Chenhui Chu
Sadao Kurohashi
Eiichiro Sumita
189
5
0
31 Jul 2023
Should you marginalize over possible tokenizations?
Should you marginalize over possible tokenizations?Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Nadezhda Chirkova
Germán Kruszewski
Jos Rozen
Marc Dymetman
262
16
0
30 Jun 2023
Evolution of Efficient Symbolic Communication Codes
Evolution of Efficient Symbolic Communication Codes
Anton Kolonin
146
2
0
04 Jun 2023
Subword Segmental Machine Translation: Unifying Segmentation and Target
  Sentence Generation
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Francois Meyer
Jan Buys
242
9
0
11 May 2023
What changes when you randomly choose BPE merge operations? Not much
What changes when you randomly choose BPE merge operations? Not muchFirst Workshop on Insights from Negative Results in NLP (Insights), 2023
Jonne Saleva
Constantine Lignos
177
12
0
04 May 2023
Tokenization Preference for Human and Machine Learning Model: An
  Annotation Study
Tokenization Preference for Human and Machine Learning Model: An Annotation Study
Tatsuya Hiraoka
Tomoya Iwakura
203
1
0
21 Apr 2023
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary
  Restriction as Post Processing
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing
Tatsuya Hiraoka
Tomoya Iwakura
156
0
0
21 Apr 2023
Elementwise Language Representation
Elementwise Language Representation
Du-Yeong Kim
Jeeeun Kim
240
0
0
27 Feb 2023
Tokenization Consistency Matters for Generative Models on Extractive NLP
  Tasks
Tokenization Consistency Matters for Generative Models on Extractive NLP TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Kaiser Sun
Peng Qi
Yuhao Zhang
Lan Liu
William Yang Wang
Zhiheng Huang
270
11
0
19 Dec 2022
Extending the Subwording Model of Multilingual Pretrained Models for New
  Languages
Extending the Subwording Model of Multilingual Pretrained Models for New Languages
K. Imamura
Eiichiro Sumita
VLM
248
4
0
29 Nov 2022
Incorporating Context into Subword Vocabularies
Incorporating Context into Subword VocabulariesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Shaked Yehezkel
Yuval Pinter
257
14
0
13 Oct 2022
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in
  Neural Machine Translation?
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?Conference of the Association for Machine Translation in the Americas (AMTA), 2022
Ali Araabi
Christof Monz
Vlad Niculae
271
14
0
10 Aug 2022
The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
The SIGMORPHON 2022 Shared Task on Morpheme SegmentationSpecial Interest Group on Computational Morphology and Phonology Workshop (SIGMORPHON), 2022
Khuyagbaatar Batsuren
Gábor Bella
Aryaman Arora
Viktor Martinović
Kyle Gorman
...
Magda vSevvcíková
Katevrina Pelegrinová
Fausto Giunchiglia
Robert Bamler
Ekaterina Vylomova
248
51
0
15 Jun 2022
Local Byte Fusion for Neural Machine Translation
Local Byte Fusion for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Makesh Narsimhan Sreedhar
Xiangpeng Wan
Yu-Jie Cheng
Junjie Hu
612
7
0
23 May 2022
Improving Tokenisation by Alternative Treatment of Spaces
Improving Tokenisation by Alternative Treatment of SpacesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Edward Gow-Smith
Harish Tayyar Madabushi
Carolina Scarton
Aline Villavicencio
279
25
0
08 Apr 2022
LCP-dropout: Compression-based Multiple Subword Segmentation for Neural
  Machine Translation
LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation
Keita Nonaka
Kazutaka Yamanouchi
Tomohiro I
Tsuyoshi Okita
Kazutaka Shimada
Hiroshi Sakamoto
161
8
0
28 Feb 2022
Between words and characters: A Brief History of Open-Vocabulary
  Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
371
207
0
20 Dec 2021
You should evaluate your language model on marginal likelihood over
  tokenisations
You should evaluate your language model on marginal likelihood over tokenisationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Kris Cao
Laura Rimell
305
32
0
06 Sep 2021
Survey of Low-Resource Machine Translation
Survey of Low-Resource Machine TranslationComputational Linguistics (CL), 2021
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
646
225
0
01 Sep 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
411
197
0
23 Jun 2021
How to Split: the Effect of Word Segmentation on Gender Bias in Speech
  Translation
How to Split: the Effect of Word Segmentation on Gender Bias in Speech TranslationFindings (Findings), 2021
Marco Gaido
Beatrice Savoldi
L. Bentivogli
Matteo Negri
Marco Turchi
233
15
0
28 May 2021
Joint Optimization of Tokenization and Downstream Model
Joint Optimization of Tokenization and Downstream ModelFindings (Findings), 2021
Tatsuya Hiraoka
Sho Takase
Kei Uchiumi
Atsushi Keyaki
Naoaki Okazaki
207
19
0
26 May 2021
Multi-view Subword Regularization
Multi-view Subword RegularizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Xinyi Wang
Sebastian Ruder
Graham Neubig
314
51
0
15 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
  End-to-End Speech Recognition
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech RecognitionItalian National Conference on Sensors (INS), 2021
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
177
15
0
12 Mar 2021
1
Page 1 of 1