Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.06265
Cited By
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
10 March 2024
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance"
6 / 6 papers shown
Title
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
Buu Phan
Brandon Amos
Itai Gat
Marton Havasi
Matthew Muckley
Karen Ullrich
39
1
0
11 Oct 2024
Smirk: An Atomically Complete Tokenizer for Molecular Foundation Models
Alexius Wadell
Anoushka Bhutani
Venkatasubramanian Viswanathan
25
0
0
19 Sep 2024
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
24
27
0
28 Feb 2024
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
46
6
0
22 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
124
349
0
01 Feb 2024
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
1