Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.03497
Cited By
v1
v2 (latest)
Dataset Geography: Mapping Language Data to Language Users
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
7 December 2021
Fahim Faisal
Yinkai Wang
Antonios Anastasopoulos
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3★)
Papers citing
"Dataset Geography: Mapping Language Data to Language Users"
19 / 19 papers shown
Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge
Eshaan Tanwar
Anwoy Chatterjee
Michael Stephen Saxon
Alon Albalak
William Wang
Tanmoy Chakraborty
168
2
0
01 Nov 2025
Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+
York Hay Ng
Aditya Khan
Xiang Lu
Matteo Salloum
Michael Zhou
Phuong H. Hoang
A. Seza Doğruöz
En-Shiun Annie Lee
199
1
0
22 Oct 2025
Bridging Cultural Distance Between Models Default and Local Classroom Demands: How Global Teachers Adopt GenAI to Support Everyday Teaching Practices
Ruiwei Xiao
Qing Xiao
Xinying Hou
Hanqi Li
Phenyo Phemelo Moletsane
Hong Shen
John Stamper
213
1
0
13 Sep 2025
Conflicts in Texts: Data, Implications and Challenges
Siyi Liu
Dan Roth
1.0K
1
0
28 Apr 2025
DEPT: Decoupled Embeddings for Pre-training Language Models
International Conference on Learning Representations (ICLR), 2024
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
1.4K
2
0
07 Oct 2024
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
453
7
0
23 May 2024
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
503
43
0
17 May 2024
Validating and Exploring Large Geographic Corpora
International Conference on Language Resources and Evaluation (LREC), 2024
Jonathan Dunn
220
0
0
13 Mar 2024
On the Scaling Laws of Geographical Representation in Language Models
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
351
12
0
29 Feb 2024
Towards Better Inclusivity: A Diverse Tweet Corpus of English Varieties
Law (LAW), 2024
Nhi Pham
Lachlan Pham
Adam L. Meyers
152
4
0
21 Jan 2024
A Material Lens on Coloniality in NLP
William B. Held
Camille Harris
Michael Best
Diyi Yang
429
22
0
14 Nov 2023
SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning
Neural Information Processing Systems (NeurIPS), 2023
Yunxiang Zhang
Xiaojun Wan
AILaw
LRM
307
10
0
21 Jun 2023
Geographic and Geopolitical Biases of Language Models
Fahim Faisal
Antonios Anastasopoulos
285
32
0
20 Dec 2022
TaTa: A Multilingual Table-to-Text Dataset for African Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Sebastian Gehrmann
Sebastian Ruder
Vitaly Nikolaev
Jan A. Botha
Michael Chavinda
Ankur P. Parikh
Clara E. Rivera
LMTD
383
14
0
31 Oct 2022
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
David Ifeoluwa Adelani
Graham Neubig
Sebastian Ruder
Shruti Rijhwani
Michael Beukman
...
Idris Abdulmumin
Odunayo Ogundepo
Oreen Yousuf
Tatiana Moteu Ngoli
Dietrich Klakow
332
62
0
22 Oct 2022
Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World
Surangika Ranathunga
Nisansa de Silva
331
62
0
16 Oct 2022
GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Da Yin
Hritik Bansal
Masoud Monajatipoor
Liunian Harold Li
Kai-Wei Chang
285
39
0
24 May 2022
Graph-based Ensemble Machine Learning for Student Performance Prediction
Yinkai Wang
A. Ding
Kaiyi Guan
Shixi Wu
Yuanqi Du
214
7
0
15 Dec 2021
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Transactions of the Association for Computational Linguistics (TACL), 2020
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
730
718
0
10 Mar 2020
1
Page 1 of 1