Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.16837
Cited By
A Formal Perspective on Byte-Pair Encoding
29 June 2023
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Tim Vieira
Mrinmaya Sachan
Ryan Cotterell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Formal Perspective on Byte-Pair Encoding"
10 / 10 papers shown
Title
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
70
1
0
10 Jan 2025
Morphological Typology in BPE Subword Productivity and Language Modeling
Iñigo Parra
34
0
0
31 Oct 2024
Tokenization as Finite-State Transduction
Marco Cognetta
Naoaki Okazaki
21
0
0
21 Oct 2024
Batching BPE Tokenization Merges
Alexander P. Morgan
30
0
0
05 Aug 2024
Improving Self Consistency in LLMs through Probabilistic Tokenization
Ashutosh Sathe
Divyanshu Aggarwal
Sunayana Sitaram
37
4
0
04 Jul 2024
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
Sunil Kumar Kopparapu
Ashish Panda
28
0
0
29 Apr 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
Two Counterexamples to Tokenization and the Noiseless Channel
Marco Cognetta
Vilém Zouhar
Sangwhan Moon
Naoaki Okazaki
27
0
0
22 Feb 2024
LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation
Christian Munley
Aaron Jarmusch
Sunita Chandrasekaran
27
16
0
08 Oct 2023
Tokenization and the Noiseless Channel
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Mrinmaya Sachan
Ryan Cotterell
30
31
0
29 Jun 2023
1