Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.01304
Cited By
Subword-Delimited Downsampling for Better Character-Level Translation
2 December 2022
Lukas Edman
Antonio Toral
Gertjan van Noord
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Subword-Delimited Downsampling for Better Character-Level Translation"
7 / 7 papers shown
Title
CUTE: Measuring LLMs' Understanding of Their Tokens
Lukas Edman
Helmut Schmid
Alexander M. Fraser
34
3
0
23 Sep 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
32
3
0
22 Apr 2024
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
21
84
0
12 May 2023
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation
Francois Meyer
Jan Buys
33
8
0
11 May 2023
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation
Lukas Edman
Gabriele Sarti
Antonio Toral
Gertjan van Noord
Arianna Bisazza
16
11
0
28 Feb 2023
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Alexander M. Fraser
65
28
0
15 Oct 2021
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
Hicham El Boukkouri
Olivier Ferret
Thomas Lavergne
Hiroshi Noji
Pierre Zweigenbaum
Junichi Tsujii
71
156
0
20 Oct 2020
1