Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.13357
Cited By
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
24 March 2022
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
Rahmad Mahendra
Kemal Kurniawan
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
Re-assign community
ArXiv
PDF
HTML
Papers citing
"One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia"
50 / 50 papers shown
Title
Improving Informally Romanized Language Identification
Adrian Benton
Alexander Gutkin
Christo Kirov
Brian Roark
38
0
0
30 Apr 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
Tsz Chung Cheng
Chung Shing Cheng
Chaak Ming Lau
Eugene Tin-Ho Lam
Chun Yat Wong
Hoi On Yu
Cheuk Hei Chong
ELM
53
1
0
16 Mar 2025
Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation
Ben Hutchinson
Celeste Rodríguez Louro
Glenys Collard
Ned Cooper
57
0
0
05 Mar 2025
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts
Muhammad Farid Adilazuarda
M. Wijanarko
Lucky Susanto
Khumaisa Nuráini
Derry Wijaya
Alham Fikri Aji
49
0
0
25 Feb 2025
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages
Jia Guo
Longxu Dou
Guangtao Zeng
Stanley Kok
Wei Lu
Qian Liu
ELM
LRM
65
1
0
02 Dec 2024
Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models
Garry Kuwanto
Chaitanya Agarwal
Genta Indra Winata
Derry Wijaya
48
1
0
30 Oct 2024
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
Wenxuan Zhang
Hou Pong Chan
Yiran Zhao
Mahani Aljunied
Jianyu Wang
...
Zhiqiang Hu
Weiwen Xu
Yew Ken Chia
Xin Li
Li Bing
LRM
41
0
0
29 Jul 2024
Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects
Orevaoghene Ahia
Anuoluwapo Aremu
Diana Abagyan
Hila Gonen
David Ifeoluwa Adelani
Daud Abolade
Noah A. Smith
Yulia Tsvetkov
48
3
0
27 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
69
9
0
14 Jun 2024
IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces
Fajri Koto
Rahmad Mahendra
Nurul Aisyah
Timothy Baldwin
LRM
59
16
0
02 Apr 2024
Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages
Joanito Agili Lopo
Radius Tanone
36
2
0
01 Apr 2024
LLMs Are Few-Shot In-Context Low-Resource Language Learners
Samuel Cahyawijaya
Holy Lovenia
Pascale Fung
33
32
0
25 Mar 2024
Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service
Mirza Alim Mutasodirin
Radityo Eko Prasojo
Achmad F. Abka
Hanif Rasyidi
VLM
12
0
0
19 Mar 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
19
1
0
04 Mar 2024
Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese
Rifki Afina Putri
Faiz Ghifari Haznitrama
Dea Adhista
Alice H. Oh
37
14
0
27 Feb 2024
Could We Have Had Better Multilingual LLMs If English Was Not the Central Language?
Ryandito Diandaru
Lucky Susanto
Zilu Tang
Ayu Purwarianti
Derry Wijaya
25
1
0
21 Feb 2024
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
Zhiwei He
Xing Wang
Wenxiang Jiao
Zhuosheng Zhang
Rui Wang
Shuming Shi
Zhaopeng Tu
ALM
23
24
0
23 Jan 2024
Natural Language Processing for Dialects of a Language: A Survey
Aditya Joshi
Raj Dabre
Diptesh Kanojia
Zhuang Li
Haolan Zhan
Gholamreza Haffari
Doris Dippold
LM&MA
10
27
0
11 Jan 2024
IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages
Muhammad Farid Adilazuarda
Samuel Cahyawijaya
Genta Indra Winata
Pascale Fung
Ayu Purwarianti
27
11
0
21 Nov 2023
A Material Lens on Coloniality in NLP
William B. Held
Camille Harris
Michael Best
Diyi Yang
10
11
0
14 Nov 2023
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia
Lucky Susanto
Ryandito Diandaru
Adila Alfa Krisnadhi
Ayu Purwarianti
Derry Wijaya
11
0
0
02 Nov 2023
IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems
Muhammad Dehan Al Kautsar
Rahmah Khoirussyifa' Nurdini
Samuel Cahyawijaya
Genta Indra Winata
Ayu Purwarianti
11
0
0
02 Nov 2023
Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation
A. Seza Doğruöz
Sunayana Sitaram
Zheng-Xin Yong
19
13
0
31 Oct 2023
Quantifying the Dialect Gap and its Correlates Across Languages
Anjali Kantharuban
Ivan Vulić
Anna Korhonen
54
19
0
23 Oct 2023
CebuaNER: A New Baseline Cebuano Named Entity Recognition Model
Ma. Beatrice Emanuela Pilar
Ellyza Mari Papas
Mary Loise Buenaventura
Dane Dedoroy
M. D. Montefalcon
Jay Rhald Padilla
Lany L. Maceda
Mideth B. Abisado
Joseph Marvin Imperial
8
1
0
01 Oct 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Dea Adhista
Emmanuel Dave
...
Genta Indra Winata
David Moeljadi
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
34
7
0
19 Sep 2023
AlbNER: A Corpus for Named Entity Recognition in Albanian
Erion Çano
14
1
0
15 Sep 2023
Lexical Diversity in Kinship Across Languages and Dialects
H. Khalilia
Gábor Bella
Abed Alhakim Freihat
Shandy Darma
Fausto Giunchiglia
6
7
0
24 Aug 2023
Multi-lingual and Multi-cultural Figurative Language Understanding
Anubha Kabra
Emmy Liu
Simran Khanuja
Alham Fikri Aji
Genta Indra Winata
Samuel Cahyawijaya
Anuoluwapo Aremu
Perez Ogayo
Graham Neubig
11
26
0
25 May 2023
Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation
Haonan Li
Fajri Koto
Minghao Wu
Alham Fikri Aji
Timothy Baldwin
ALM
14
73
0
24 May 2023
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer
Akari Asai
Sneha Kudugunta
Xinyan Velocity Yu
Terra Blevins
Hila Gonen
Machel Reid
Yulia Tsvetkov
Sebastian Ruder
Hannaneh Hajishirzi
15
53
0
24 May 2023
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning
Samuel Cahyawijaya
Holy Lovenia
Tiezheng Yu
Willy Chung
Pascale Fung
ALM
39
14
0
23 May 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
Verena Blaschke
Hinrich Schütze
Barbara Plank
9
13
0
19 Apr 2023
Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Zheng-Xin Yong
Ruochen Zhang
Jessica Zosa Forde
Skyler Wang
Arjun Subramonian
...
Yinghua Tan
Long Phan
Rowena Garcia
Thamar Solorio
Alham Fikri Aji
LRM
41
28
0
23 Mar 2023
Fairness in Language Models Beyond English: Gaps and Challenges
Krithika Ramesh
Sunayana Sitaram
Monojit Choudhury
14
23
0
24 Feb 2023
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
Yejin Bang
Samuel Cahyawijaya
Nayeon Lee
Wenliang Dai
Dan Su
...
Tiezheng Yu
Willy Chung
Quyet V. Do
Yan Xu
Pascale Fung
ReLM
LRM
11
1,311
0
08 Feb 2023
A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies
A. Seza Doğruöz
Sunayana Sitaram
Barbara E. Bullock
Almeida Jacqueline Toribio
60
72
0
05 Jan 2023
SERENGETI: Massively Multilingual Language Models for Africa
Ife Adebara
AbdelRahim Elmadany
Muhammad Abdul-Mageed
Alcides Alcoba Inciarte
12
29
0
21 Dec 2022
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
Genta Indra Winata
Alham Fikri Aji
Zheng-Xin Yong
Thamar Solorio
22
31
0
19 Dec 2022
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
13
47
0
19 Dec 2022
Multilingual Relation Classification via Efficient and Effective Prompting
Yuxuan Chen
David Harbecke
Leonhard Hennig
LRM
19
11
0
25 Oct 2022
Rethinking Round-Trip Translation for Machine Translation Evaluation
Terry Yue Zhuo
Qiongkai Xu
Xuanli He
Trevor Cohn
LRM
9
2
0
15 Sep 2022
NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages
Samuel Cahyawijaya
Alham Fikri Aji
Holy Lovenia
Genta Indra Winata
Bryan Wilie
...
Fajri Koto
David Moeljadi
Karissa Vincentio
Ade Romadhony
Ayu Purwarianti
24
5
0
21 Jul 2022
Location-based Twitter Filtering for the Creation of Low-Resource Language Datasets in Indonesian Local Languages
Mukhlis Amien
Chong Feng
Heyan Huang
8
3
0
15 Jun 2022
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Genta Indra Winata
Alham Fikri Aji
Samuel Cahyawijaya
Rahmad Mahendra
Fajri Koto
...
Pascale Fung
Timothy Baldwin
Jey Han Lau
Rico Sennrich
Sebastian Ruder
21
77
0
31 May 2022
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Zaid Alyafeai
Maraim Masoud
Mustafa Ghaleb
Maged S. Al-Shaibani
31
21
0
13 Oct 2021
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
90
167
0
28 Sep 2021
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
102
64
0
14 Sep 2021
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
Fajri Koto
Jey Han Lau
Timothy Baldwin
VLM
52
82
0
10 Sep 2021
Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences
Genta Indra Winata
Andrea Madotto
Chien-Sheng Wu
Pascale Fung
SyDa
124
92
0
18 Sep 2019
1