Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.04748
Cited By
Breaking Character: Are Subwords Good Enough for MRLs After All?
10 April 2022
Omri Keren
Tal Avinari
Reut Tsarfaty
Omer Levy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Breaking Character: Are Subwords Good Enough for MRLs After All?"
14 / 14 papers shown
Title
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
57
0
0
18 Mar 2025
MenakBERT -- Hebrew Diacriticizer
Ido Cohen
Jacob Gidron
Idan Pinto
VLM
16
0
0
03 Oct 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
43
22
0
10 Mar 2024
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler
Matthieu Labeau
Chloé Clavel
VLM
30
2
0
22 Feb 2024
D-Nikud: Enhancing Hebrew Diacritization with LSTM and Pretrained Models
Adi Rosenthal
Nadav Shaked
11
0
0
30 Jan 2024
Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew
Eylon Gueta
Omer Goldman
Reut Tsarfaty
11
1
0
01 Nov 2023
Text Rendering Strategies for Pixel Language Models
Jonas F. Lotz
Elizabeth Salesky
Phillip Rust
Desmond Elliott
VLM
22
11
0
01 Nov 2023
What is the best recipe for character-level encoder-only modelling?
Kris Cao
32
2
0
09 May 2023
Impact of Subword Pooling Strategy on Cross-lingual Event Detection
Shantanu Agarwal
Steven Fincke
Chris Jenkins
Scott Miller
Elizabeth Boschee
14
2
0
22 Feb 2023
Multilingual Sequence-to-Sequence Models for Hebrew NLP
Matan Eyal
Hila Noga
Roee Aharoni
Idan Szpektor
Reut Tsarfaty
27
4
0
19 Dec 2022
Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All
Eylon Guetta
Avi Shmidman
Shaltiel Shmidman
C. Shmidman
Joshua Guedalia
Moshe Koppel
Dan Bareket
Amit Seker
Reut Tsarfaty
VLM
16
14
0
28 Nov 2022
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
35
8
0
13 Oct 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
30
46
0
14 Jul 2022
ParaShoot: A Hebrew Question Answering Dataset
Omri Keren
Omer Levy
29
17
0
23 Sep 2021
1