ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08487
  4. Cited By
Taxi1500: A Multilingual Dataset for Text Classification in 1500
  Languages

Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages

15 May 2023
Chunlan Ma
Ayyoob Imani
Haotian Ye
Renhao Pei
Ehsaneddin Asgari
Hinrich Schütze
ArXivPDFHTML

Papers citing "Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages"

21 / 21 papers shown
Title
GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
Hengyu Luo
Zihao Li
Joseph Attieh
Sawal Devkota
Ona de Gibert
...
Ananda Sreenidhi
Raúl Vázquez
Mengjie Wang
Samea Yusofi
Jörg Tiedemann
ELM
31
0
0
05 Apr 2025
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages
Chen Zhang
Mingxu Tao
Zhiyuan Liao
Yansong Feng
36
0
0
03 Mar 2025
Uhura: A Benchmark for Evaluating Scientific Question Answering and
  Truthfulness in Low-Resource African Languages
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
...
Shamsuddeen Hassan Muhammad
Choice Mpanza
Igneciah Pocia Thete
Dietrich Klakow
David Ifeoluwa Adelani
HILM
ELM
63
6
0
01 Dec 2024
LangSAMP: Language-Script Aware Multilingual Pretraining
LangSAMP: Language-Script Aware Multilingual Pretraining
Yihong Liu
Haotian Ye
Chunlan Ma
Mingyang Wang
Hinrich Schütze
VLM
24
0
0
26 Sep 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
35
3
0
26 Sep 2024
SpeechTaxi: On Multilingual Semantic Speech Classification
SpeechTaxi: On Multilingual Semantic Speech Classification
Lennart Keller
Goran Glavaš
26
0
0
10 Sep 2024
Exploring the Role of Transliteration in In-Context Learning for
  Low-resource Languages Written in Non-Latin Scripts
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts
Chunlan Ma
Yihong Liu
Haotian Ye
Hinrich Schütze
21
1
0
02 Jul 2024
Breaking the Script Barrier in Multilingual Pre-Trained Language Models
  with Transliteration-Based Post-Training Alignment
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
Orgest Xhelili
Yihong Liu
Hinrich Schütze
26
6
0
28 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
50
6
0
05 Jun 2024
TransMI: A Framework to Create Strong Baselines from Multilingual
  Pretrained Language Models for Transliterated Data
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
Yihong Liu
Chunlan Ma
Haotian Ye
Hinrich Schütze
29
4
0
16 May 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Yinghui Li
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
47
35
0
07 Apr 2024
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for
  Angolan Language Model
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model
Osvaldo Luamba Quinjica
David Ifeoluwa Adelani
22
0
0
03 Apr 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and
  Bias
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Yuemei Xu
Ling Hu
Jiayi Zhao
Zihan Qiu
Yuqi Ye
Hanwen Gu
LRM
19
36
0
01 Apr 2024
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence
  Labeling Tasks
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Bolei Ma
Ercong Nie
Shuzhou Yuan
Helmut Schmid
Michael Farber
Frauke Kreuter
Hinrich Schütze
VLM
95
4
0
29 Jan 2024
TransliCo: A Contrastive Learning Framework to Address the Script
  Barrier in Multilingual Pretrained Language Models
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
Yihong Liu
Chunlan Ma
Haotian Ye
Hinrich Schütze
23
1
0
12 Jan 2024
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient
  Large-scale Multilingual Continued Pretraining
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Yihong Liu
Peiqin Lin
Mingyang Wang
Hinrich Schütze
16
21
0
15 Nov 2023
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic
  Classification in 200+ Languages and Dialects
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
David Ifeoluwa Adelani
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
ELM
22
59
0
14 Sep 2023
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning?
  Insights from Cross-Lingual Language Understanding
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding
Bolei Ma
Ercong Nie
Helmut Schmid
Hinrich Schütze
AAML
VLM
LRM
18
8
0
15 Jul 2023
Crosslingual Transfer Learning for Low-Resource Languages Based on
  Multilingual Colexification Graphs
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
Yihong Liu
Haotian Ye
Leonie Weissweiler
Renhao Pei
Hinrich Schütze
25
10
0
22 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
29
95
0
20 May 2023
MLQA: Evaluating Cross-lingual Extractive Question Answering
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
242
489
0
16 Oct 2019
1