ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.14614
  4. Cited By
Two Counterexamples to Tokenization and the Noiseless Channel
v1v2 (latest)

Two Counterexamples to Tokenization and the Noiseless Channel

22 February 2024
Marco Cognetta
Vilém Zouhar
Sangwhan Moon
Naoaki Okazaki
ArXiv (abs)PDFHTML

Papers citing "Two Counterexamples to Tokenization and the Noiseless Channel"

4 / 4 papers shown
Title
Length-MAX Tokenizer for Language Models
Length-MAX Tokenizer for Language Models
Dong Dong
Weijie Su
VLM
118
0
0
25 Nov 2025
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8
Preston Firestone
Shubham Ugare
Gagandeep Singh
Sasa Misailovic
80
1
0
05 Nov 2025
Aneurysm Growth Time Series Reconstruction Using Physics-informed Autoencoder
Aneurysm Growth Time Series Reconstruction Using Physics-informed Autoencoder
Jiacheng Wu
AI4CE
76
10
0
05 Oct 2025
Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
Saketh Reddy Vemula
Sandipan Dandapat
D. Sharma
Parameswari Krishnamurthy
199
0
0
11 Aug 2025
1