Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2402.14614
Cited By
v1
v2 (latest)
Two Counterexamples to Tokenization and the Noiseless Channel
22 February 2024
Marco Cognetta
Vilém Zouhar
Sangwhan Moon
Naoaki Okazaki
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Two Counterexamples to Tokenization and the Noiseless Channel"
4 / 4 papers shown
Title
Length-MAX Tokenizer for Language Models
Dong Dong
Weijie Su
VLM
118
0
0
25 Nov 2025
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8
Preston Firestone
Shubham Ugare
Gagandeep Singh
Sasa Misailovic
80
1
0
05 Nov 2025
Aneurysm Growth Time Series Reconstruction Using Physics-informed Autoencoder
Jiacheng Wu
AI4CE
76
10
0
05 Oct 2025
Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
Saketh Reddy Vemula
Sandipan Dandapat
D. Sharma
Parameswari Krishnamurthy
199
0
0
11 Aug 2025
1