ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.04748
  4. Cited By
Breaking Character: Are Subwords Good Enough for MRLs After All?

Breaking Character: Are Subwords Good Enough for MRLs After All?

10 April 2022
Omri Keren
Tal Avinari
Reut Tsarfaty
Omer Levy
ArXivPDFHTML

Papers citing "Breaking Character: Are Subwords Good Enough for MRLs After All?"

14 / 14 papers shown
Title
Splintering Nonconcatenative Languages for Better Tokenization
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
57
0
0
18 Mar 2025
MenakBERT -- Hebrew Diacriticizer
MenakBERT -- Hebrew Diacriticizer
Ido Cohen
Jacob Gidron
Idan Pinto
VLM
16
0
0
03 Oct 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation
  with Model Performance
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
43
22
0
10 Mar 2024
The Impact of Word Splitting on the Semantic Content of Contextualized
  Word Representations
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler
Matthieu Labeau
Chloé Clavel
VLM
30
2
0
22 Feb 2024
D-Nikud: Enhancing Hebrew Diacritization with LSTM and Pretrained Models
D-Nikud: Enhancing Hebrew Diacritization with LSTM and Pretrained Models
Adi Rosenthal
Nadav Shaked
11
0
0
30 Jan 2024
Explicit Morphological Knowledge Improves Pre-training of Language
  Models for Hebrew
Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew
Eylon Gueta
Omer Goldman
Reut Tsarfaty
11
1
0
01 Nov 2023
Text Rendering Strategies for Pixel Language Models
Text Rendering Strategies for Pixel Language Models
Jonas F. Lotz
Elizabeth Salesky
Phillip Rust
Desmond Elliott
VLM
22
11
0
01 Nov 2023
What is the best recipe for character-level encoder-only modelling?
What is the best recipe for character-level encoder-only modelling?
Kris Cao
32
2
0
09 May 2023
Impact of Subword Pooling Strategy on Cross-lingual Event Detection
Impact of Subword Pooling Strategy on Cross-lingual Event Detection
Shantanu Agarwal
Steven Fincke
Chris Jenkins
Scott Miller
Elizabeth Boschee
14
2
0
22 Feb 2023
Multilingual Sequence-to-Sequence Models for Hebrew NLP
Multilingual Sequence-to-Sequence Models for Hebrew NLP
Matan Eyal
Hila Noga
Roee Aharoni
Idan Szpektor
Reut Tsarfaty
27
4
0
19 Dec 2022
Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive
  Analysis of Hebrew BERT Models and a New One to Outperform Them All
Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All
Eylon Guetta
Avi Shmidman
Shaltiel Shmidman
C. Shmidman
Joshua Guedalia
Moshe Koppel
Dan Bareket
Amit Seker
Reut Tsarfaty
VLM
16
14
0
28 Nov 2022
Incorporating Context into Subword Vocabularies
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
35
8
0
13 Oct 2022
Language Modelling with Pixels
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
30
46
0
14 Jul 2022
ParaShoot: A Hebrew Question Answering Dataset
ParaShoot: A Hebrew Question Answering Dataset
Omri Keren
Omer Levy
29
17
0
23 Sep 2021
1