Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.08490
Cited By
Multi-view Subword Regularization
15 March 2021
Xinyi Wang
Sebastian Ruder
Graham Neubig
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-view Subword Regularization"
18 / 18 papers shown
Title
Deterministic Reversible Data Augmentation for Neural Machine Translation
Jiashu Yao
Heyan Huang
Zeming Liu
Yuhang Guo
49
0
0
21 Feb 2025
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Francois Meyer
Jan Buys
29
1
0
29 Mar 2024
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Nadezhda Chirkova
Sergey Troshin
21
8
0
01 Aug 2023
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation
Francois Meyer
Jan Buys
33
8
0
11 May 2023
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling
Zhijun Wang
Xuebo Liu
Min Zhang
25
11
0
23 Nov 2022
Subword Segmental Language Modelling for Nguni Languages
Francois Meyer
Jan Buys
8
4
0
12 Oct 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
30
46
0
14 Jul 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla
Logan Born
Anoop Sarkar
4
16
0
01 Apr 2022
Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization
Zi-Yi Dou
Nanyun Peng
ELM
15
26
0
01 Jan 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
23
140
0
20 Dec 2021
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
Arij Riabi
Benoît Sagot
Djamé Seddah
26
15
0
26 Oct 2021
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
62
12
0
09 Sep 2021
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau
Noah A. Smith
23
27
0
16 Jun 2021
Evaluating Various Tokenizers for Arabic Text Classification
Zaid Alyafeai
Maged S. Al-Shaibani
Mustafa Ghaleb
Irfan Ahmad
13
41
0
14 Jun 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
25
210
0
11 Mar 2021
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
65
0
24 Oct 2020
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
244
491
0
16 Oct 2019
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
250
13,360
0
25 Aug 2014
1