ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.09095
  4. Cited By
The State and Fate of Linguistic Diversity and Inclusion in the NLP
  World

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

20 April 2020
Pratik M. Joshi
Sebastin Santy
A. Budhiraja
Kalika Bali
Monojit Choudhury
    LMTD
ArXivPDFHTML

Papers citing "The State and Fate of Linguistic Diversity and Inclusion in the NLP World"

50 / 139 papers shown
Title
PeLLE: Encoder-based language models for Brazilian Portuguese based on
  open data
PeLLE: Encoder-based language models for Brazilian Portuguese based on open data
Guilherme Lamartine de Mello
Marcelo Finger
F. Serras
M. Carpi
Marcos Menon Jose
Pedro Henrique Domingues
Paulo Cavalim
27
0
0
29 Feb 2024
How Far Can We Extract Diverse Perspectives from Large Language Models?
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
40
10
0
16 Nov 2023
Fumbling in Babel: An Investigation into ChatGPT's Language
  Identification Ability
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
Wei-Rui Chen
Ife Adebara
Khai Duy Doan
Qisheng Liao
Muhammad Abdul-Mageed
17
5
0
16 Nov 2023
When Is Multilinguality a Curse? Language Modeling for 250 High- and
  Low-Resource Languages
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
28
7
0
15 Nov 2023
Structural Priming Demonstrates Abstract Grammatical Representations in
  Multilingual Language Models
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models
J. Michaelov
Catherine Arnett
Tyler A. Chang
Benjamin Bergen
36
12
0
15 Nov 2023
MEGAVERSE: Benchmarking Large Language Models Across Languages,
  Modalities, Models and Tasks
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Sanchit Ahuja
Divyanshu Aggarwal
Varun Gumma
Ishaan Watts
Ashutosh Sathe
...
Rishav Hada
Prachi Jain
Maxamed Axmed
Kalika Bali
Sunayana Sitaram
ELM
32
39
0
13 Nov 2023
Leveraging LLMs for Synthesizing Training Data Across Many Languages in
  Multilingual Dense Retrieval
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Nandan Thakur
Jianmo Ni
Gustavo Hernández Ábrego
John Wieting
Jimmy J. Lin
Daniel Matthew Cer
RALM
29
12
0
10 Nov 2023
Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering
Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering
Ofir Arviv
Dmitry Nikolaev
Taelin Karidi
Omri Abend
LRM
30
3
0
20 Oct 2023
EfficientOCR: An Extensible, Open-Source Package for Efficiently
  Digitizing World Knowledge
EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge
Tom Bryan
Jacob Carlson
Abhishek Arora
Melissa Dell
23
8
0
16 Oct 2023
A Benchmark for Learning to Translate a New Language from One Grammar
  Book
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
24
51
0
28 Sep 2023
BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls
  of Large Language Models on Bengali NLP
BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP
M. Kabir
Mohammed Saidul Islam
Md Tahmid Rahman Laskar
Mir Tafseer Nayeem
M Saiful Bari
Enamul Hoque
LM&MA
24
15
0
22 Sep 2023
Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for
  Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems
Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems
Songbo Hu
Han Zhou
Mete Hergul
Milan Gritta
Guchun Zhang
Ignacio Iacobacci
Ivan Vulić
Anna Korhonen
28
10
0
26 Jul 2023
CFL: Causally Fair Language Models Through Token-level Attribute
  Controlled Generation
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
21
5
0
01 Jun 2023
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual
  Transfer
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer
Akari Asai
Sneha Kudugunta
Xinyan Velocity Yu
Terra Blevins
Hila Gonen
Machel Reid
Yulia Tsvetkov
Sebastian Ruder
Hannaneh Hajishirzi
31
54
0
24 May 2023
LLM-powered Data Augmentation for Enhanced Cross-lingual Performance
LLM-powered Data Augmentation for Enhanced Cross-lingual Performance
Chenxi Whitehouse
Monojit Choudhury
Alham Fikri Aji
SyDa
LRM
30
68
0
23 May 2023
SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot
  Cross-lingual Information Extraction
SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction
Jun-Yu Ma
Jia-Chen Gu
Zhen-Hua Ling
Quan Liu
Cong Liu
Guoping Hu
51
1
0
21 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
31
95
0
20 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip H. S. Torr
Adel Bibi
16
96
0
17 May 2023
Taxi1500: A Multilingual Dataset for Text Classification in 1500
  Languages
Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages
Chunlan Ma
Ayyoob Imani
Haotian Ye
Renhao Pei
Ehsaneddin Asgari
Hinrich Schütze
27
23
0
15 May 2023
How Good are Commercial Large Language Models on African Languages?
How Good are Commercial Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
19
5
0
11 May 2023
Train Global, Tailor Local: Minimalist Multilingual Translation into
  Endangered Languages
Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages
Zhong Zhou
J. Niehues
Alexander Waibel
22
0
0
05 May 2023
GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based
  Adapters
GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters
Md Mahfuz Ibn Alam
Ruoyu Xie
Fahim Faisal
Antonios Anastasopoulos
30
3
0
25 Apr 2023
Transcending the "Male Code": Implicit Masculine Biases in NLP Contexts
Transcending the "Male Code": Implicit Masculine Biases in NLP Contexts
Katie Seaborn
Shruti Chandra
Thibault Fabre
21
11
0
22 Apr 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
Verena Blaschke
Hinrich Schütze
Barbara Plank
19
13
0
19 Apr 2023
Transfer to a Low-Resource Language via Close Relatives: The Case Study
  on Faroese
Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese
Vésteinn Snaebjarnarson
A. Simonsen
Goran Glavavs
Ivan Vulić
35
19
0
18 Apr 2023
Efficient OCR for Building a Diverse Digital History
Efficient OCR for Building a Diverse Digital History
Jacob Carlson
Tom Bryan
Melissa Dell
23
11
0
05 Apr 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
20
42
0
31 Mar 2023
AfroDigits: A Community-Driven Spoken Digit Dataset for African
  Languages
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Chris C. Emezue
Sanchit Gandhi
Lewis Tunstall
Abubakar Abid
Josh Meyer
...
Douwe Kiela
Yacine Jernite
Julien Chaumond
Merve Noyan
Omar Sanseviero
25
2
0
22 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving
  Cross-Lingual Transfer
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer
Shanu Kumar
Abbaraju Soujanya
Sandipan Dandapat
Sunayana Sitaram
Monojit Choudhury
VLM
25
1
0
04 Mar 2023
The unreasonable effectiveness of few-shot learning for machine
  translation
The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia
Yamini Bansal
Colin Cherry
George F. Foster
M. Krikun
Fan Feng
Melvin Johnson
Orhan Firat
27
102
0
02 Feb 2023
The Decades Progress on Code-Switching Research in NLP: A Systematic
  Survey on Trends and Challenges
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
Genta Indra Winata
Alham Fikri Aji
Zheng-Xin Yong
Thamar Solorio
37
33
0
19 Dec 2022
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
Zheng-Xin Yong
Hailey Schoelkopf
Niklas Muennighoff
Alham Fikri Aji
David Ifeoluwa Adelani
...
Genta Indra Winata
Stella Biderman
Edward Raff
Dragomir R. Radev
Vassilina Nikoulina
CLL
VLM
AI4CE
LRM
27
81
0
19 Dec 2022
POTATO: The Portable Text Annotation Tool
POTATO: The Portable Text Annotation Tool
Jiaxin Pei
Aparna Ananthasubramaniam
Xingyao Wang
Naitian Zhou
Jackson Sargent
Apostolos Dedeloudis
David Jurgens
VLM
19
58
0
16 Dec 2022
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction
  and Necessary Resources
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources
Xinyan Velocity Yu
Akari Asai
Trina Chatterjee
Junjie Hu
Eunsol Choi
16
21
0
28 Nov 2022
Too Brittle To Touch: Comparing the Stability of Quantization and
  Distillation Towards Developing Lightweight Low-Resource MT Models
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
27
5
0
27 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into
  Under-Resourced Languages
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
23
10
0
20 Oct 2022
Some Languages are More Equal than Others: Probing Deeper into the
  Linguistic Disparity in the NLP World
Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World
Surangika Ranathunga
Nisansa de Silva
29
34
0
16 Oct 2022
The Ethical Risks of Analyzing Crisis Events on Social Media with
  Machine Learning
The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning
Angelie Kraft
Ricardo Usbeck
14
4
0
07 Oct 2022
A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type
  Identification in Sanskrit
A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit
Jivnesh Sandhan
Ashish Gupta
Hrishikesh Terdalkar
Tushar Sandhan
S. Samanta
Laxmidhar Behera
Pawan Goyal
11
3
0
22 Aug 2022
A Comprehensive Survey of Natural Language Generation Advances from the
  Perspective of Digital Deception
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
34
3
0
11 Aug 2022
On the Limitations of Sociodemographic Adaptation with Transformers
On the Limitations of Sociodemographic Adaptation with Transformers
Chia-Chien Hung
Anne Lauscher
Dirk Hovy
Simone Paolo Ponzetto
Goran Glavavs
19
0
0
01 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
24
10
0
25 Jul 2022
Language Modelling with Pixels
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
30
46
0
14 Jul 2022
KOLD: Korean Offensive Language Dataset
KOLD: Korean Offensive Language Dataset
Young-kuk Jeong
Juhyun Oh
Jaimeen Ahn
Jongwon Lee
Jihyung Mon
Sungjoon Park
Alice H. Oh
40
25
0
23 May 2022
Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using
  Multilingual BERT
Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT
Beiduo Chen
Wu Guo
Quan Liu
Kun Tao
29
1
0
17 May 2022
MASALA: Modelling and Analysing the Semantics of Adpositions in
  Linguistic Annotation of Hindi
MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi
Aryaman Arora
N. Venkateswaran
Nathan Schneider
13
4
0
08 May 2022
You Are What You Write: Preserving Privacy in the Era of Large Language
  Models
You Are What You Write: Preserving Privacy in the Era of Large Language Models
Richard Plant
V. Giuffrida
Dimitra Gkatzia
PILM
17
19
0
20 Apr 2022
Multilingual Event Linking to Wikidata
Multilingual Event Linking to Wikidata
Adithya Pratapa
Rishubh Gupta
Teruko Mitamura
19
7
0
13 Apr 2022
MuCoT: Multilingual Contrastive Training for Question-Answering in
  Low-resource Languages
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
Gokul Karthik Kumar
Abhishek Singh Gehlot
Sahal Shaji Mullappilly
Karthik Nandakumar
26
13
0
12 Apr 2022
MMTAfrica: Multilingual Machine Translation for African Languages
MMTAfrica: Multilingual Machine Translation for African Languages
Chris C. Emezue
Bonaventure F. P. Dossou
19
24
0
08 Apr 2022
Previous
123
Next