ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.00798
  4. Cited By
Mapping Languages: The Corpus of Global Language Use

Mapping Languages: The Corpus of Global Language Use

2 April 2020
Jonathan Dunn
ArXiv (abs)PDFHTML

Papers citing "Mapping Languages: The Corpus of Global Language Use"

38 / 38 papers shown
Title
ConLID: Supervised Contrastive Learning for Low-Resource Language Identification
ConLID: Supervised Contrastive Learning for Low-Resource Language Identification
Negar Foroutan
Jakhongir Saydaliev
Ye Eun Kim
Antoine Bosselut
5
0
0
18 Jun 2025
Stronger Together: Unleashing the Social Impact of Hate Speech Research
Stronger Together: Unleashing the Social Impact of Hate Speech Research
Sidney Wong
21
0
0
19 May 2025
Detecting Linguistic Diversity on Social Media
Detecting Linguistic Diversity on Social Media
Sidney Gig-Jan Wong
Benjamin Adams
Jonathan Dunn
91
0
0
28 Feb 2025
Large corpora and large language models: a replicable method for automating grammatical annotation
Cameron Morin
Matti Marttinen Larsson
121
1
0
18 Nov 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
121
8
0
31 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
71
1
0
17 Oct 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
117
7
0
26 Sep 2024
Goldfish: Monolingual Language Models for 350 Languages
Goldfish: Monolingual Language Models for 350 Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
129
10
0
19 Aug 2024
Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social
  Media
Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media
Sidney G. -J. Wong
51
0
0
01 Jul 2024
MaskLID: Code-Switching Language Identification through Iterative
  Masking
MaskLID: Code-Switching Language Identification through Iterative Masking
Amir Hossein Kargaran
François Yvon
Hinrich Schütze
61
2
0
10 Jun 2024
Pre-Trained Language Models Represent Some Geographic Populations Better
  Than Others
Pre-Trained Language Models Represent Some Geographic Populations Better Than Others
Jonathan Dunn
Benjamin Adams
Harish Tayyar Madabushi
56
4
0
16 Mar 2024
Geographically-Informed Language Identification
Geographically-Informed Language Identification
Jonathan Dunn
Lane Edwards-Brown
61
3
0
14 Mar 2024
Validating and Exploring Large Geographic Corpora
Validating and Exploring Large Geographic Corpora
Jonathan Dunn
63
0
0
13 Mar 2024
Code-Switched Language Identification is Harder Than You Think
Code-Switched Language Identification is Harder Than You Think
Laurie Burchell
Alexandra Birch
Robert P. Thompson
Kenneth Heafield
57
0
0
02 Feb 2024
cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in
  Under-resourced Languages
cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages
Sidney Gig-Jan Wong
Matthew Durward
43
0
0
28 Jan 2024
GlotLID: Language Identification for Low-Resource Languages
GlotLID: Language Identification for Low-Resource Languages
Amir Hossein Kargaran
Ayyoob Imani
François Yvon
Hinrich Schütze
112
15
0
24 Oct 2023
Syntactic Variation Across the Grammar: Modelling a Complex Adaptive
  System
Syntactic Variation Across the Grammar: Modelling a Complex Adaptive System
Jonathan Dunn
46
3
0
21 Sep 2023
Comparing Measures of Linguistic Diversity Across Social Media Language
  Data and Census Data at Subnational Geographic Areas
Comparing Measures of Linguistic Diversity Across Social Media Language Data and Census Data at Subnational Geographic Areas
Sidney Gig-Jan Wong
Jonathan Dunn
B. Adams
60
1
0
21 Aug 2023
cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media
  Comments using Spatio-Temporally Retrained Language Models
cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models
Sidney Gig-Jan Wong
Matthew Durward
Benjamin Adams
Jonathan Dunn
27
7
0
20 Aug 2023
An Open Dataset and Model for Language Identification
An Open Dataset and Model for Language Identification
Laurie Burchell
Alexandra Birch
Nikolay Bogoychev
Kenneth Heafield
70
36
0
23 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALMLRM
134
107
0
20 May 2023
Variation and Instability in Dialect-Based Embedding Spaces
Variation and Instability in Dialect-Based Embedding Spaces
Jonathan Dunn
71
3
0
27 Mar 2023
AfroDigits: A Community-Driven Spoken Digit Dataset for African
  Languages
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Chris C. Emezue
Sanchit Gandhi
Lewis Tunstall
Abubakar Abid
Josh Meyer
...
Douwe Kiela
Yacine Jernite
Julien Chaumond
Merve Noyan
Omar Sanseviero
72
2
0
22 Mar 2023
Exploring the Constructicon: Linguistic Analysis of a Computational CxG
Exploring the Constructicon: Linguistic Analysis of a Computational CxG
Jonathan Dunn
66
5
0
30 Jan 2023
Exposure and Emergence in Usage-Based Grammar: Computational Experiments
  in 35 Languages
Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages
Jonathan Dunn
102
8
0
25 Nov 2022
AfroLID: A Neural Language Identification Tool for African Languages
AfroLID: A Neural Language Identification Tool for African Languages
Ife Adebara
AbdelRahim Elmadany
Muhammad Abdul-Mageed
Alcides Alcoba Inciarte
106
33
0
21 Oct 2022
Register Variation Remains Stable Across 60 Languages
Register Variation Remains Stable Across 60 Languages
Haipeng Li
Jonathan Dunn
A. Nini
92
9
0
20 Sep 2022
Stability of Syntactic Dialect Classification Over Space and Time
Stability of Syntactic Dialect Classification Over Space and Time
Jonathan Dunn
Sidney Gig-Jan Wong
53
5
0
11 Sep 2022
Corpus Similarity Measures Remain Robust Across Diverse Languages
Corpus Similarity Measures Remain Robust Across Diverse Languages
Haipeng Li
Jonathan Dunn
59
7
0
09 Jun 2022
Predicting Embedding Reliability in Low-Resource Settings Using Corpus
  Similarity Measures
Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures
Jonathan Dunn
Haipeng Li
Damian Sastre
49
5
0
09 Jun 2022
Language Identification for Austronesian Languages
Language Identification for Austronesian Languages
Jonathan Dunn
Wikke Nijhof
54
6
0
09 Jun 2022
Building Machine Translation Systems for the Next Thousand Languages
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhiwen Chen
Yonghui Wu
Macduff Hughes
121
101
0
09 May 2022
Learned Construction Grammars Converge Across Registers Given Increased
  Exposure
Learned Construction Grammars Converge Across Registers Given Increased Exposure
Jonathan Dunn
Harish Tayyar Madabushi
63
8
0
12 Oct 2021
Capturing the diversity of multilingual societies
Capturing the diversity of multilingual societies
Thomas Louf
David Sánchez
J. Ramasco
41
14
0
06 May 2021
Representations of Language Varieties Are Reliable Given Corpus
  Similarity Measures
Representations of Language Varieties Are Reliable Given Corpus Similarity Measures
J. Dunn
41
11
0
03 Apr 2021
Measuring Linguistic Diversity During COVID-19
Measuring Linguistic Diversity During COVID-19
Artaches Ambartsoumian
F. Popowich
Benjamin Adams
72
35
0
03 Apr 2021
Disembodied Machine Learning: On the Illusion of Objectivity in NLP
Disembodied Machine Learning: On the Illusion of Objectivity in NLP
Zeerak Talat
Smarika Lulz
Joachim Bingel
Isabelle Augenstein
167
51
0
28 Jan 2021
Language ID in the Wild: Unexpected Challenges on the Path to a
  Thousand-Language Web Text Corpus
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell
Theresa Breiner
D. Esch
Ankur Bapna
92
90
0
27 Oct 2020
1