ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.23825
  4. Cited By
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

31 October 2024
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
    VLM
ArXivPDFHTML

Papers citing "GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages"

3 / 3 papers shown
Title
Improving Informally Romanized Language Identification
Improving Informally Romanized Language Identification
Adrian Benton
Alexander Gutkin
Christo Kirov
Brian Roark
40
0
0
30 Apr 2025
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs
Jaap Jumelet
Leonie Weissweiler
Arianna Bisazza
38
2
0
03 Apr 2025
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
Yingli Shen
Wen Lai
Shuo Wang
Xueren Zhang
Kangyang Luo
Alexander M. Fraser
Maosong Sun
47
0
0
17 Feb 2025
1