Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World

16 October 2022

Papers citing "Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World"

28 / 28 papers shown

Title
NLP Security and Ethics, in the Wild Heather Lent Erick Galinkin Yiyi Chen Jens Myrup Pedersen Leon Derczynski Johannes Bjerva SILM 42 0 0 09 Apr 2025
A Framework to Assess Multilingual Vulnerabilities of LLMs Likai Tang Niruth Bogahawatta Yasod Ginige Jiarui Xu Shixuan Sun Surangika Ranathunga Suranga Seneviratne 37 0 0 17 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama Naome A. Etori Kevin Lu Randu Karisa Arturs Kanepajs LRM ELM 143 0 0 14 Mar 2025
Improving the quality of Web-mined Parallel Corpora of Low-Resource Languages using Debiasing Heuristics Aloka Fernando Surangika Ranathunga Nisansa de Silva 36 0 0 26 Feb 2025
The Call for Socially Aware Language Technologies Diyi Yang Dirk Hovy David Jurgens Barbara Plank VLM 51 11 0 24 Feb 2025
Unsupervised Bilingual Lexicon Induction for Low Resource Languages Charitha Rathnayake P. R. S. Thilakarathna Uthpala Nethmini Rishemjith Kaur Surangika Ranathunga 67 0 0 22 Dec 2024
A Multi-way Parallel Named Entity Annotated Corpus for English, Tamil and Sinhala Surangika Ranathunga Asanka Ranasinghea Janaka Shamala Ayodya Dandeniyaa Rashmi Galappaththia Malithi Samaraweeraa 68 0 0 03 Dec 2024
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book? Seth Aycock David Stap Di Wu Christof Monz Khalil Simaán 29 2 0 27 Sep 2024
A global AI community requires language-diverse publishing Haley Lepp Parth Sarin 21 2 0 27 Aug 2024
Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research Surangika Ranathunga Nisansa de Silva Dilith Jayakody Aloka Fernando 27 2 0 10 Jun 2024
What Drives Performance in Multilingual Language Models? Sina Bagheri Nezhad Ameeta Agrawal LRM 35 9 0 29 Apr 2024
ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus Injy Hamed Fadhl Eryani David Palfreyman Nizar Habash 38 2 0 27 Mar 2024
Harnessing the power of LLMs for normative reasoning in MASs B. Savarimuthu Surangika Ranathunga Stephen Cranefield LLMAG 32 6 0 25 Mar 2024
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages Rik van Noord Taja Kuzman Peter Rupnik Nikola Ljubesic Miquel Espla-Gomis Gema Ramírez-Sánchez Antonio Toral ALM 27 1 0 13 Mar 2024
Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora Surangika Ranathunga Nisansa de Silva Menan Velayuthan Aloka Fernando Charitha Rathnayake 31 10 0 12 Feb 2024
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language Kasun Wickramasinghe Nisansa de Silva 14 0 0 17 Nov 2023
Exploring the Maze of Multilingual Modeling Sina Bagheri Nezhad Ameeta Agrawal 16 1 0 09 Oct 2023
Sinhala-English Parallel Word Dictionary Dataset Kasun Wickramasinghe Nisansa de Silva 11 3 0 04 Aug 2023
The ACL OCL Corpus: Advancing Open Science in Computational Linguistics Shaurya Rohatgi Yanxia Qin Benjamin Aw Niranjana Unnithan MingSung Kan LMTD 18 12 0 24 May 2023
Preparing the Vukúzenzele and ZA-gov-multilingual South African multilingual corpora Richard Lastrucci Isheanesu Dzingirai Jenalea Rajab Andani Madodonga Matimba Shingange Daniel Njini Vukosi Marivate 20 14 0 07 Mar 2023
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models Harshita Diddee Sandipan Dandapat Monojit Choudhury T. Ganu Kalika Bali 27 5 0 27 Oct 2022
Dataset Geography: Mapping Language Data to Language Users Fahim Faisal Yinkai Wang Antonios Anastasopoulos 54 23 0 07 Dec 2021
Systematic Inequalities in Language Technology Performance across the World's Languages Damián E. Blasi Antonios Anastasopoulos Graham Neubig 111 131 0 13 Oct 2021
Neural Machine Translation for Low-Resource Languages: A Survey Surangika Ranathunga E. Lee Marjana Prifti Skenduli Ravi Shekhar Mehreen Alam Rishemjit Kaur 27 234 0 29 Jun 2021
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages Abteen Ebrahimi Manuel Mager Arturo Oncevay Vishrav Chaudhary Luis Chiruzzo ... Graham Neubig Alexis Palmer Rolando A. Coto Solano Ngoc Thang Vu Katharina Kann 102 71 0 18 Apr 2021
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models Benjamin Muller Antonis Anastasopoulos Benoît Sagot Djamé Seddah LRM 124 165 0 24 Oct 2020
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios Michael A. Hedderich Lukas Lange Heike Adel Jannik Strötgen Dietrich Klakow 194 286 0 23 Oct 2020
Keyphrase Extraction from Disaster-related Tweets Jishnu Ray Chowdhury Cornelia Caragea Doina Caragea 24 39 0 17 Oct 2019