Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.01393
Cited By
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT
2 June 2023
Benoist Wolleb
Romain Silvestri
Giorgos Vernikos
Ljiljana Dolamic
Ljiljana Dolamic Andrei Popescu-Belis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT"
5 / 5 papers shown
Title
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression
Noel Elias
H. Esfahanizadeh
Kaan Kale
S. Vishwanath
Muriel Médard
33
0
0
28 Oct 2024
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Alexander M. Fraser
63
28
0
15 Oct 2021
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
62
12
0
09 Sep 2021
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
254
1,896
0
10 Jan 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,740
0
26 Sep 2016
1