Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.12182
Cited By
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
20 May 2023
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
Nora Kassner
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages"
30 / 80 papers shown
Title
Constrained Decoding for Cross-lingual Label Projection
Duong Minh Le
Yang Chen
Alan Ritter
Wei-ping Xu
14
6
0
05 Feb 2024
"It's how you do things that matters": Attending to Process to Better Serve Indigenous Communities with Language Technologies
Ned Cooper
Courtney Heldreth
Ben Hutchinson
11
7
0
04 Feb 2024
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Bolei Ma
Ercong Nie
Shuzhou Yuan
Helmut Schmid
Michael Farber
Frauke Kreuter
Hinrich Schütze
VLM
95
4
0
29 Jan 2024
MaLA-500: Massive Language Adaptation of Large Language Models
Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
ELM
15
15
0
24 Jan 2024
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
Yihong Liu
Chunlan Ma
Haotian Ye
Hinrich Schütze
15
1
0
12 Jan 2024
Towards Conversational Diagnostic AI
Tao Tu
Anil Palepu
M. Schaekermann
Khaled Saab
Jan Freyberg
...
Katherine Chou
Greg S. Corrado
Yossi Matias
Alan Karthikesalingam
Vivek Natarajan
AI4MH
LM&MA
15
87
0
11 Jan 2024
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer
Haotian Ye
Yihong Liu
Chunlan Ma
Hinrich Schütze
13
0
0
09 Jan 2024
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?
Tannon Kew
Florian Schottmann
Rico Sennrich
LRM
13
34
0
20 Dec 2023
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
Bibek Upadhayay
Vahid Behzadan
8
7
0
17 Nov 2023
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
20
7
0
15 Nov 2023
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Yihong Liu
Peiqin Lin
Mingyang Wang
Hinrich Schütze
11
21
0
15 Nov 2023
GlotLID: Language Identification for Low-Resource Languages
Amir Hossein Kargaran
Ayyoob Imani
François Yvon
Hinrich Schütze
12
10
0
24 Oct 2023
Exploring the Maze of Multilingual Modeling
Sina Bagheri Nezhad
Ameeta Agrawal
9
1
0
09 Oct 2023
CebuaNER: A New Baseline Cebuano Named Entity Recognition Model
Ma. Beatrice Emanuela Pilar
Ellyza Mari Papas
Mary Loise Buenaventura
Dane Dedoroy
M. D. Montefalcon
Jay Rhald Padilla
Lany L. Maceda
Mideth B. Abisado
Joseph Marvin Imperial
8
1
0
01 Oct 2023
GlotScript: A Resource and Tool for Low Resource Writing System Identification
Amir Hossein Kargaran
François Yvon
Hinrich Schütze
9
10
0
23 Sep 2023
Resolving Legalese: A Multilingual Exploration of Negation Scope Resolution in Legal Documents
Ramona Christen
Anastassia Shaitarova
Matthias Sturmer
Joel Niklaus
AILaw
ELM
20
3
0
15 Sep 2023
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
David Ifeoluwa Adelani
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
ELM
14
59
0
14 Sep 2023
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Xinyi Wang
John Wieting
J. Clark
CLL
ALM
8
1
0
09 Sep 2023
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Christopher A. Choquette-Choo
...
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
44
117
0
09 Sep 2023
Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations
Leonardo Ranaldi
Giulia Pucci
André Freitas
17
33
0
27 Aug 2023
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding
Bolei Ma
Ercong Nie
Helmut Schmid
Hinrich Schütze
AAML
VLM
LRM
13
8
0
15 Jul 2023
MultiLegalPile: A 689GB Multilingual Legal Corpus
Joel Niklaus
Veton Matoshi
Matthias Sturmer
Ilias Chalkidis
Daniel E. Ho
AILaw
ELM
14
20
0
03 Jun 2023
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
Peiqin Lin
Chengzhi Hu
Zheyu Zhang
André F. T. Martins
Hinrich Schütze
14
1
0
23 May 2023
Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction
Yang Chen
Vedaant Shah
Alan Ritter
21
2
0
23 May 2023
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
Yihong Liu
Haotian Ye
Leonie Weissweiler
Renhao Pei
Hinrich Schütze
17
10
0
22 May 2023
Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages
Chunlan Ma
Ayyoob Imani
Haotian Ye
Renhao Pei
Ehsaneddin Asgari
Hinrich Schütze
14
23
0
15 May 2023
NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis
Mingyang Wang
Heike Adel
Lukas Lange
Jannik Strötgen
Hinrich Schütze
47
15
0
28 Apr 2023
Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages
Silvia Severini
Ayyoob Imani
Philipp Dufter
Hinrich Schütze
23
7
0
28 Jan 2022
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
56
0
24 Oct 2020
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita
Jun Suzuki
Masaaki Nagata
LRM
28
64
0
25 Nov 2019
Previous
1
2