Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.03341
Cited By
Neural Machine Translation with Byte-Level Subwords
7 September 2019
Changhan Wang
Kyunghyun Cho
Jiatao Gu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Machine Translation with Byte-Level Subwords"
13 / 13 papers shown
Title
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
58
0
0
18 Mar 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
44
0
0
24 Feb 2025
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Francois Meyer
Jan Buys
24
1
0
29 Mar 2024
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
Vilém Zouhar
AAML
30
0
0
29 Jan 2024
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
Fei Yuan
Shuai Yuan
Zhiyong Wu
Lei Li
20
10
0
15 Nov 2023
A Comparative Study of Pretrained Language Models for Long Clinical Text
Yikuan Li
R. M. Wehbe
F. Ahmad
Hanyin Wang
Yuan Luo
LM&MA
ELM
VLM
MedIm
24
78
0
27 Jan 2023
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
14
148
0
15 Apr 2022
Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences
Yikuan Li
R. M. Wehbe
F. Ahmad
Hanyin Wang
Yuan Luo
VLM
MedIm
135
84
0
27 Jan 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
23
138
0
20 Dec 2021
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Max Nihlén Ramström
Jisu Jeong
Jung-Woo Ha
S. Kim
ELM
21
38
0
15 Nov 2021
Discontinuous Grammar as a Foreign Language
Daniel Fernández-González
Carlos Gómez-Rodríguez
45
9
0
20 Oct 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
11
210
0
11 Mar 2021
Towards End-to-End In-Image Neural Machine Translation
Elman Mansimov
Mitchell Stern
M. Chen
Orhan Firat
Jakob Uszkoreit
Puneet Jain
22
25
0
20 Oct 2020
1