Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.01035
Cited By
Getting the most out of your tokenizer for pre-training and domain adaptation
1 February 2024
Gautier Dagan
Gabriele Synnaeve
Baptiste Rozière
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Getting the most out of your tokenizer for pre-training and domain adaptation"
4 / 4 papers shown
Title
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong
Noah Lee
Rodrigo Martínez-Castaño
César Rodríguez
James Thorne
44
3
0
23 Oct 2024
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
Buu Phan
Brandon Amos
Itai Gat
Marton Havasi
Matthew Muckley
Karen Ullrich
45
1
0
11 Oct 2024
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
204
1,451
0
02 Sep 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
1