Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.01289
Cited By
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
2 March 2024
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
10 / 10 papers shown
Title
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
E. Ponti
56
0
0
25 Mar 2025
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
57
0
0
18 Mar 2025
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
70
1
0
24 Feb 2025
Hit the Sweet Spot! Span-Level Ensemble for Large Language Models
Yangyifan Xu
Jianghao Chen
Junhong Wu
Jiajun Zhang
MoE
17
2
0
27 Sep 2024
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
E. Ponti
Ivan Vulić
VLM
33
8
0
13 May 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
24
9
0
20 Apr 2024
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
24
2
0
30 Mar 2024
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Catherine Arnett
Pamela D. Rivière
Tyler A. Chang
Sean Trott
14
2
0
20 Mar 2024
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
24
27
0
28 Feb 2024
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1