ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.13348
  4. Cited By
Analyzing Cognitive Plausibility of Subword Tokenization

Analyzing Cognitive Plausibility of Subword Tokenization

20 October 2023
Lisa Beinborn
Yuval Pinter
ArXivPDFHTML

Papers citing "Analyzing Cognitive Plausibility of Subword Tokenization"

12 / 12 papers shown
Title
Splintering Nonconcatenative Languages for Better Tokenization
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
47
0
0
18 Mar 2025
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer
  Training
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
Pavel Chizhov
Catherine Arnett
Elizaveta Korotkova
Ivan P. Yamshchikov
30
2
0
06 Sep 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV
  Generalization Challenge
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
19
9
0
20 Apr 2024
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
19
2
0
30 Mar 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation
  with Model Performance
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
30
22
0
10 Mar 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
25
14
0
02 Mar 2024
Rethinking Tokenization: Crafting Better Tokenizers for Large Language
  Models
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
Jinbiao Yang
LLMAG
24
9
0
01 Mar 2024
Knowledge of Pretrained Language Models on Surface Information of Tokens
Knowledge of Pretrained Language Models on Surface Information of Tokens
Tatsuya Hiraoka
Naoaki Okazaki
17
1
0
15 Feb 2024
Tokenization Preference for Human and Machine Learning Model: An
  Annotation Study
Tokenization Preference for Human and Machine Learning Model: An Annotation Study
Tatsuya Hiraoka
Tomoya Iwakura
6
1
0
21 Apr 2023
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
Morphology Matters: A Multilingual Language Modeling Analysis
Morphology Matters: A Multilingual Language Modeling Analysis
Hyunji Hayley Park
Katherine J. Zhang
Coleman Haley
K. Steimel
Han Liu
Lane Schwartz
39
39
0
11 Dec 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
56
0
24 Oct 2020
1