Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.13348
Cited By
Analyzing Cognitive Plausibility of Subword Tokenization
20 October 2023
Lisa Beinborn
Yuval Pinter
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing Cognitive Plausibility of Subword Tokenization"
12 / 12 papers shown
Title
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
47
0
0
18 Mar 2025
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
Pavel Chizhov
Catherine Arnett
Elizaveta Korotkova
Ivan P. Yamshchikov
30
2
0
06 Sep 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
19
9
0
20 Apr 2024
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
19
2
0
30 Mar 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
30
22
0
10 Mar 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
25
14
0
02 Mar 2024
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
Jinbiao Yang
LLMAG
24
9
0
01 Mar 2024
Knowledge of Pretrained Language Models on Surface Information of Tokens
Tatsuya Hiraoka
Naoaki Okazaki
17
1
0
15 Feb 2024
Tokenization Preference for Human and Machine Learning Model: An Annotation Study
Tatsuya Hiraoka
Tomoya Iwakura
6
1
0
21 Apr 2023
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
Morphology Matters: A Multilingual Language Modeling Analysis
Hyunji Hayley Park
Katherine J. Zhang
Coleman Haley
K. Steimel
Han Liu
Lane Schwartz
39
39
0
11 Dec 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
56
0
24 Oct 2020
1