ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.04662
  4. Cited By
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

9 September 2023
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Christopher A. Choquette-Choo
Katherine Lee
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
ArXivPDFHTML

Papers citing "MADLAD-400: A Multilingual And Document-Level Large Audited Dataset"

21 / 21 papers shown
Title
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
45
0
0
03 May 2025
Improving Informally Romanized Language Identification
Improving Informally Romanized Language Identification
Adrian Benton
Alexander Gutkin
Christo Kirov
Brian Roark
43
0
0
30 Apr 2025
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
Yingli Shen
Wen Lai
Shuo Wang
Xueren Zhang
Kangyang Luo
Alexander M. Fraser
Maosong Sun
47
0
0
17 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
91
0
0
05 Feb 2025
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Menglong Cui
Pengzhi Gao
Wei Liu
Jian Luan
Bin Wang
LRM
41
0
0
04 Feb 2025
LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning
LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning
Shuguang Chen
Guang Lin
LRM
58
0
0
28 Dec 2024
Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service
Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service
Raphael Merx
Hanna Suominen
Adérito José Guterres Correia
Trevor Cohn
65
1
0
19 Nov 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
36
5
0
31 Oct 2024
Data Processing for the OpenGPT-X Model Family
Data Processing for the OpenGPT-X Model Family
Nicolo' Brandizzi
Hammam Abdelwahab
Anirban Bhowmick
Lennard Helmer
Benny Jörg Stein
...
Georg Rehm
Dennis Wegener
Nicolas Flores-Herr
Joachim Kohler
Johannes Leveling
VLM
79
2
0
11 Oct 2024
Efficiently Identifying Low-Quality Language Subsets in Multilingual
  Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Farhan Samir
Emily P. Ahn
Shreya Prakash
Márton Soskuthy
Vered Shwartz
Jian Zhu
24
0
0
05 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
67
7
0
03 Oct 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
35
3
0
26 Sep 2024
Modular Sentence Encoders: Separating Language Specialization from
  Cross-Lingual Alignment
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment
Yongxin Huang
Kexin Wang
Goran Glavavs
Iryna Gurevych
44
0
0
20 Jul 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
77
9
0
14 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
50
7
0
05 Jun 2024
The Mosaic Memory of Large Language Models
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
39
3
0
24 May 2024
Separating the Wheat from the Chaff with BREAD: An open-source benchmark
  and metrics to detect redundancy in text
Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
Isaac Caswell
Lisa Wang
Isabel Papadimitriou
23
0
0
11 Nov 2023
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
104
0
24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
82
0
22 Sep 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
A Baseline Neural Machine Translation System for Indian Languages
A Baseline Neural Machine Translation System for Indian Languages
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
50
17
0
29 Jul 2019
1