Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.06744
Cited By
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
13 October 2021
Zaid Alyafeai
Maraim Masoud
Mustafa Ghaleb
Maged S. Al-Shaibani
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masader: Metadata Sourcing for Arabic Text and Speech Data Resources"
15 / 15 papers shown
Title
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs
Zaid Alyafeai
Maged S. Al-Shaibani
Bernard Ghanem
11
0
0
26 May 2025
Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets
Amr Keleg
Walid Magdy
Sharon Goldwater
59
3
0
18 May 2024
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Ahmet Üstün
Viraat Aryabumi
Zheng-Xin Yong
Wei-Yin Ko
Daniel D'souza
...
Shayne Longpre
Niklas Muennighoff
Marzieh Fadaee
Julia Kreutzer
Sara Hooker
ALM
ELM
SyDa
LRM
98
230
0
12 Feb 2024
Toxic language detection: a systematic review of Arabic datasets
Imene Bensalem
Paolo Rosso
Hanane Zitouni
73
5
0
12 Dec 2023
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Hugo Laurenccon
Lucile Saulnier
Thomas Wang
Christopher Akiki
Albert Villanova del Moral
...
Violette Lepercq
Suzana Ilić
Margaret Mitchell
Sasha Luccioni
Yacine Jernite
AI4CE
AILaw
75
169
0
07 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni
Atharva Kulkarni
Sara Shatnawi
Hanan Aldarmaki
37
9
0
28 Feb 2023
In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages
Asim Ersoy
Gerson Vizcarra
T. Mayeesha
Benjamin Muller
67
2
0
23 Feb 2023
SAIDS: A Novel Approach for Sentiment Analysis Informed of Dialect and Sarcasm
Abdelrahman Kaseb
Mona Farouk
26
9
0
06 Jan 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
125
50
0
19 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
453
2,398
0
09 Nov 2022
Maknuune: A Large Open Palestinian Arabic Lexicon
Shahd Dibas
Christian Khairallah
Nizar Habash
Omar Fayez Sadi
Tariq Sairafy
Karmel Sarabta
Abrar Ardah
CVBM
55
4
0
24 Oct 2022
Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Yousef Altaher
A. Fadel
Mazen Alotaibi
Mazen Alyazidi
Mishari Al-Mutairi
...
Mustafa Ghaleb
Nouamane Tazi
Raed Alharbi
Maraim Masoud
Zaid Alyafeai
75
10
0
01 Aug 2022
NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages
Samuel Cahyawijaya
Alham Fikri Aji
Holy Lovenia
Genta Indra Winata
Bryan Wilie
...
Fajri Koto
David Moeljadi
Karissa Vincentio
Ade Romadhony
Ayu Purwarianti
80
5
0
21 Jul 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
104
106
0
24 Mar 2022
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Angelina McMillan-Major
Zaid Alyafeai
Stella Biderman
Kimbo Chen
F. Toni
...
Aitor Soroa Etxabe
Pedro Ortiz Suarez
Zeerak Talat
Daniel Alexander van Strien
Yacine Jernite
83
14
0
25 Jan 2022
1