Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.13292
Cited By
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
20 April 2024
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge"
5 / 5 papers shown
Title
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
34
12
0
08 Oct 2024
Smirk: An Atomically Complete Tokenizer for Molecular Foundation Models
Alexius Wadell
Anoushka Bhutani
Venkatasubramanian Viswanathan
25
0
0
19 Sep 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
32
22
0
10 Mar 2024
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
48
12
0
02 Feb 2024
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
103
91
0
06 Oct 2022
1