ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.13267
  4. Cited By
BPE-Dropout: Simple and Effective Subword Regularization

BPE-Dropout: Simple and Effective Subword Regularization

29 October 2019
Ivan Provilkov
Dmitrii Emelianenko
Elena Voita
ArXivPDFHTML

Papers citing "BPE-Dropout: Simple and Effective Subword Regularization"

47 / 147 papers shown
Title
Overlap-based Vocabulary Generation Improves Cross-lingual Transfer
  Among Related Languages
Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages
Vaidehi Patil
Partha P. Talukdar
Sunita Sarawagi
16
21
0
03 Mar 2022
LCP-dropout: Compression-based Multiple Subword Segmentation for Neural
  Machine Translation
LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation
Keita Nonaka
Kazutaka Yamanouchi
Tomohiro I
Tsuyoshi Okita
Kazutaka Shimada
H. Sakamoto
24
8
0
28 Feb 2022
Refining the state-of-the-art in Machine Translation, optimizing NMT for
  the JA <-> EN language pair by leveraging personal domain expertise
Refining the state-of-the-art in Machine Translation, optimizing NMT for the JA <-> EN language pair by leveraging personal domain expertise
Matthew Bieda
16
1
0
23 Feb 2022
Semantic Code Classification for Automated Machine Learning
Semantic Code Classification for Automated Machine Learning
P. Guseva
Anastasia Drozdova
N. Denisenko
Daria Sapozhnikova
Ivan Pyaternev
Anna Scherbakova
A.E. Ustuzhanin
21
0
0
25 Jan 2022
Fine-Tuning Transformers: Vocabulary Transfer
Fine-Tuning Transformers: Vocabulary Transfer
Vladislav D. Mosin
Igor Samenko
Alexey Tikhonov
Borislav M. Kozlovskii
Ivan P. Yamshchikov
9
19
0
29 Dec 2021
Between words and characters: A Brief History of Open-Vocabulary
  Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
30
141
0
20 Dec 2021
L-Verse: Bidirectional Generation Between Image and Text
L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim
Gwangmo Song
Sihaeng Lee
Sangyun Kim
Yewon Seo
Soonyoung Lee
S. Kim
Honglak Lee
Kyunghoon Bae
23
25
0
22 Nov 2021
NVIDIA NeMo Neural Machine Translation Systems for English-German and
  English-Russian News and Biomedical Tasks at WMT21
NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian
Oleksii Hrinchuk
Virginia Adams
Oleksii Kuchaiev
VLM
22
16
0
16 Nov 2021
Why don't people use character-level machine translation?
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Alexander Fraser
65
28
0
15 Oct 2021
How BPE Affects Memorization in Transformers
How BPE Affects Memorization in Transformers
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
163
32
0
06 Oct 2021
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing
  Language Models
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models
Robert Wolfe
Aylin Caliskan
85
51
0
01 Oct 2021
EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT
EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT
Svetlana Tchistiakova
Jesujoba Oluwadara Alabi
Koel Dutta Chowdhury
Sourav Dutta
Dana Ruiter
VLM
28
6
0
29 Sep 2021
Improving Zero-shot Cross-lingual Transfer between Closely Related
  Languages by injecting Character-level Noise
Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise
Noëmi Aepli
Rico Sennrich
23
17
0
14 Sep 2021
Wine is Not v i n. -- On the Compatibility of Tokenizations Across
  Languages
Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages
Antonis Maronikolakis
Philipp Dufter
Hinrich Schütze
19
17
0
13 Sep 2021
Subword Mapping and Anchoring across Languages
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
67
12
0
09 Sep 2021
You should evaluate your language model on marginal likelihood over
  tokenisations
You should evaluate your language model on marginal likelihood over tokenisations
Kris Cao
Laura Rimell
31
23
0
06 Sep 2021
How Suitable Are Subword Segmentation Strategies for Translating
  Non-Concatenative Morphology?
How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?
Chantal Amrhein
Rico Sennrich
27
13
0
02 Sep 2021
Survey of Low-Resource Machine Translation
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
31
147
0
01 Sep 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
51
152
0
23 Jun 2021
Consistency Regularization for Cross-Lingual Fine-Tuning
Consistency Regularization for Cross-Lingual Fine-Tuning
Bo Zheng
Li Dong
Shaohan Huang
Wenhui Wang
Zewen Chi
Saksham Singhal
Wanxiang Che
Ting Liu
Xia Song
Furu Wei
19
58
0
15 Jun 2021
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language
  Generation
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation
Xin Liu
Baosong Yang
Dayiheng Liu
Haibo Zhang
Weihua Luo
Min Zhang
Haiying Zhang
Jinsong Su
18
18
0
11 Jun 2021
Joint Optimization of Tokenization and Downstream Model
Joint Optimization of Tokenization and Downstream Model
Tatsuya Hiraoka
Sho Takase
Kei Uchiumi
Atsushi Keyaki
Naoaki Okazaki
16
17
0
26 May 2021
HerBERT: Efficiently Pretrained Transformer-based Language Model for
  Polish
HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish
Robert Mroczkowski
Piotr Rybak
Alina Wróblewska
Ireneusz Gawlik
28
81
0
04 May 2021
AlloST: Low-resource Speech Translation without Source Transcription
AlloST: Low-resource Speech Translation without Source Transcription
Yao-Fei Cheng
Hung-Shin Lee
Hsin-Min Wang
19
8
0
01 May 2021
Robust Open-Vocabulary Translation from Visual Text Representations
Robust Open-Vocabulary Translation from Visual Text Representations
Elizabeth Salesky
David Etter
Matt Post
VLM
8
39
0
16 Apr 2021
Smartphone Camera Oximetry in an Induced Hypoxemia Study
Smartphone Camera Oximetry in an Induced Hypoxemia Study
Jason S. Hoffman
Varun K. Viswanath
Xinyi Ding
Matthew J. Thompson
Eric C. Larson
Shwetak N. Patel
Edward J Wang
10
25
0
31 Mar 2021
Multi-view Subword Regularization
Multi-view Subword Regularization
Xinyi Wang
Sebastian Ruder
Graham Neubig
19
45
0
15 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
  End-to-End Speech Recognition
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
18
14
0
12 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
36
210
0
11 Mar 2021
Learning Word-Level Confidence For Subword End-to-End ASR
Learning Word-Level Confidence For Subword End-to-End ASR
David Qiu
Qiujia Li
Yanzhang He
Yu Zhang
Bo-wen Li
...
Deepti Bhatia
Wei Li
Ke Hu
Tara N. Sainath
Ian McGraw
24
32
0
11 Mar 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Neural Machine Translation: A Review of Methods, Resources, and Tools
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
15
105
0
31 Dec 2020
Subword Sampling for Low Resource Word Alignment
Subword Sampling for Low Resource Word Alignment
Ehsaneddin Asgari
Masoud Jalili Sabet
Philipp Dufter
Christoph Ringlstetter
Hinrich Schütze
11
5
0
21 Dec 2020
Pre-training Protein Language Models with Label-Agnostic Binding Pairs
  Enhances Performance in Downstream Tasks
Pre-training Protein Language Models with Label-Agnostic Binding Pairs Enhances Performance in Downstream Tasks
Modestas Filipavicius
Matteo Manica
Joris Cadow
María Rodríguez Martínez
10
13
0
05 Dec 2020
Using Multiple Subwords to Improve English-Esperanto Automated Literary
  Translation Quality
Using Multiple Subwords to Improve English-Esperanto Automated Literary Translation Quality
Alberto Poncelas
J. Buts
J. Hadley
Andy Way
11
2
0
28 Nov 2020
Subword Segmentation and a Single Bridge Language Affect Zero-Shot
  Neural Machine Translation
Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation
Annette Rios Gonzales
Mathias Müller
Rico Sennrich
11
19
0
03 Nov 2020
The LMU Munich System for the WMT 2020 Unsupervised Machine Translation
  Shared Task
The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task
Alexandra Chronopoulou
Dario Stojanovski
Viktor Hangya
Alexander Fraser
37
5
0
25 Oct 2020
Query-Key Normalization for Transformers
Query-Key Normalization for Transformers
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
17
75
0
08 Oct 2020
Alleviating the Inequality of Attention Heads for Neural Machine
  Translation
Alleviating the Inequality of Attention Heads for Neural Machine Translation
Zewei Sun
Shujian Huang
Xinyu Dai
Jiajun Chen
8
7
0
21 Sep 2020
Subword Regularization: An Analysis of Scalability and Generalization
  for End-to-End Automatic Speech Recognition
Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition
Egor Lakomkin
Jahn Heymann
Ilya Sklyar
Simon Wiesler
9
8
0
10 Aug 2020
Can We Achieve More with Less? Exploring Data Augmentation for Toxic
  Comment Classification
Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification
Chetanya Rastogi
Nikka Mofid
Fang-I Hsiao
29
11
0
02 Jul 2020
2kenize: Tying Subword Sequences for Chinese Script Conversion
2kenize: Tying Subword Sequences for Chinese Script Conversion
Pranav A
Isabelle Augenstein
22
1
0
07 May 2020
Dynamic Programming Encoding for Subword Segmentation in Neural Machine
  Translation
Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
Xuanli He
Gholamreza Haffari
Mohammad Norouzi
19
45
0
03 May 2020
Evaluating Robustness to Input Perturbations for Neural Machine
  Translation
Evaluating Robustness to Input Perturbations for Neural Machine Translation
Xing Niu
Prashant Mathur
Georgiana Dinu
Yaser Al-Onaizan
AAML
14
64
0
01 May 2020
Syntax-aware Data Augmentation for Neural Machine Translation
Syntax-aware Data Augmentation for Neural Machine Translation
Sufeng Duan
Hai Zhao
Dongdong Zhang
Rui-cang Wang
8
16
0
29 Apr 2020
Adversarial Subword Regularization for Robust Neural Machine Translation
Adversarial Subword Regularization for Robust Neural Machine Translation
Jungsoo Park
Mujeen Sung
Jinhyuk Lee
Jaewoo Kang
8
8
0
29 Apr 2020
Med7: a transferable clinical natural language processing model for
  electronic health records
Med7: a transferable clinical natural language processing model for electronic health records
Andrey Kormilitzin
N. Vaci
Qiang Liu
A. Nevado-Holgado
6
115
0
03 Mar 2020
Previous
123