Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.06202
Cited By
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
11 June 2020
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages"
34 / 34 papers shown
Title
Lazy But Effective: Collaborative Personalized Federated Learning with Heterogeneous Data
Ljubomir Rokvic
Panayiotis Danassis
Boi Faltings
FedML
35
0
0
05 May 2025
TigerLLM -- A Family of Bangla Large Language Models
Nishat Raihan
Marcos Zampieri
48
0
0
14 Mar 2025
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
Layba Fiaz
Munief Hassan Tahir
Sana Shams
Sarmad Hussain
49
0
0
24 Feb 2025
Exploring Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
59
1
0
17 Feb 2025
Data Processing for the OpenGPT-X Model Family
Nicolo' Brandizzi
Hammam Abdelwahab
Anirban Bhowmick
Lennard Helmer
Benny Jörg Stein
...
Georg Rehm
Dennis Wegener
Nicolas Flores-Herr
Joachim Kohler
Johannes Leveling
VLM
79
2
0
11 Oct 2024
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models
Nandini Mundra
Aditya Nanda Kishore
Raj Dabre
Ratish Puduppully
Anoop Kunchukuttan
Mitesh Khapra
30
3
0
08 Jul 2024
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Ahmad Idrissi-Yaghir
Amin Dada
Henning Schafer
Kamyar Arzideh
Giulia Baldini
...
Peter A. Horn
Christin Seifert
F. Nensa
Jens Kleesiek
Christoph M. Friedrich
AI4MH
29
2
0
08 Apr 2024
Training a Bilingual Language Model by Mapping Tokens onto a Shared Character Space
Aviad Rom
Kfir Bar
24
1
0
25 Feb 2024
RoBERTurk: Adjusting RoBERTa for Turkish
Nuri Tas
17
1
0
07 Jan 2024
Unsupervised Paraphrasing of Multiword Expressions
Takashi Wada
Yuji Matsumoto
Timothy Baldwin
Jey Han Lau
24
0
0
02 Jun 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
27
13
0
22 May 2023
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Dũng Nguyễn Mạnh
Nam Le Hai
An Dau
A. Nguyen
Khanh N. Nghiem
Jingnan Guo
Nghi D. Q. Bui
26
13
0
09 May 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
28
40
0
07 Apr 2023
FairDistillation: Mitigating Stereotyping in Language Models
Pieter Delobelle
Bettina Berendt
20
8
0
10 Jul 2022
You Are What You Write: Preserving Privacy in the Era of Large Language Models
Richard Plant
V. Giuffrida
Dimitra Gkatzia
PILM
17
19
0
20 Apr 2022
Breaking Character: Are Subwords Good Enough for MRLs After All?
Omri Keren
Tal Avinari
Reut Tsarfaty
Omer Levy
28
15
0
10 Apr 2022
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
Julien Abadji
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
CLL
34
153
0
17 Jan 2022
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
24
29
0
27 Oct 2021
MFAQ: a Multilingual FAQ Dataset
Maxime De Bruyn
Ehsan Lotfi
Jeska Buhmann
Walter Daelemans
RALM
42
21
0
27 Sep 2021
ParaShoot: A Hebrew Question Answering Dataset
Omri Keren
Omer Levy
29
17
0
23 Sep 2021
Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models
C. Carrino
Jordi Armengol-Estapé
Ona de Gibert Bonet
Asier Gutiérrez-Fandiño
Aitor Gonzalez-Agirre
Martin Krallinger
Marta Villegas
8
20
0
16 Sep 2021
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Haoran Xu
Benjamin Van Durme
Kenton W. Murray
42
57
0
09 Sep 2021
PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining
Machel Reid
Mikel Artetxe
VLM
42
26
0
04 Aug 2021
Machine Translation into Low-resource Language Varieties
Sachin Kumar
Antonios Anastasopoulos
S. Wintner
Yulia Tsvetkov
11
29
0
12 Jun 2021
Bertinho: Galician BERT Representations
David Vilares
Marcos Garcia
Carlos Gómez-Rodríguez
57
22
0
25 Mar 2021
Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability
Wei-Tsung Kao
Hung-yi Lee
16
16
0
12 Mar 2021
The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models
Go Inoue
Bashar Alhafni
Nurpeiis Baimukan
Houda Bouamor
Nizar Habash
35
223
0
11 Mar 2021
Pre-Training BERT on Arabic Tweets: Practical Considerations
Ahmed Abdelali
Sabit Hassan
Hamdy Mubarak
Kareem Darwish
Younes Samih
20
96
0
21 Feb 2021
AraGPT2: Pre-Trained Transformer for Arabic Language Generation
Wissam Antoun
Fady Baly
Hazem M. Hajj
VLM
19
103
0
31 Dec 2020
AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
17
102
0
31 Dec 2020
Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages
Kushal Kumar Jain
Adwait Deshpande
Kumar Shridhar
F. Laumann
Ayushman Dash
43
51
0
04 Nov 2020
Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models
Evan Williams
Paul Rodrigues
Valerie Novak
34
42
0
05 Sep 2020
KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media
Ali Safaya
Moutasem Abdullatif
Deniz Yuret
31
314
0
26 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
19
71
0
20 Jul 2020
1