ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06619
  4. Cited By
Aya Dataset: An Open-Access Collection for Multilingual Instruction
  Tuning

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

9 February 2024
Shivalika Singh
Freddie Vargus
Daniel D'souza
Börje F. Karlsson
Abinaya Mahendiran
Wei-Yin Ko
Herumb Shandilya
Jay Patel
Deividas Mataciunas
Laura OMahony
Mike Zhang
Ramith Hettiarachchi
Joseph Wilson
Marina Machado
Luisa Souza Moura
Dominik Krzemiñski
Hakimeh Fadaei
Irem Ergun
Ifeoma Okoh
Aisha Alaagib
Oshan Mudannayake
Zaid Alyafeai
Minh Chien Vu
Sebastian Ruder
Surya Guthikonda
Emad A. Alghamdi
Sebastian Gehrmann
Niklas Muennighoff
Max Bartolo
Julia Kreutzer
A. Ustun
Marzieh Fadaee
Sara Hooker
ArXivPDFHTML

Papers citing "Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning"

25 / 25 papers shown
Title
CARE: Aligning Language Models for Regional Cultural Awareness
CARE: Aligning Language Models for Regional Cultural Awareness
Geyang Guo
Tarek Naous
Hiromi Wakaki
Yukiko Nishimura
Yuki Mitsufuji
Alan Ritter
Wei-ping Xu
44
0
0
07 Apr 2025
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
Olivier Gouvert
Julie Hunter
Jérôme Louradour
Christophe Cerisara
Evan Dufraisse
Yaya Sy
Laura Rivière
Jean-Pierre Lorré
OpenLLM-France community
58
0
0
15 Mar 2025
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Vera Neplenbroek
Arianna Bisazza
Raquel Fernández
88
0
0
17 Feb 2025
BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Omnilingual MT Team
Pierre Yves Andrews
Mikel Artetxe
Mariano Coria Meglioli
Marta R. Costa-jussá
...
Eduardo Sánchez
Ioannis Tsiamas
Arina Turkatenko
Albert Ventayol-Boada
Shireen Yates
93
0
0
06 Feb 2025
AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought
AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought
Xin Huang
Tarun K. Vangani
Zhengyuan Liu
Bowei Zou
A. Aw
LRM
AI4CE
43
0
0
27 Jan 2025
Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages
Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages
Hoang Nguyen
Khyati Mahajan
Vikas Yadav
Philip S. Yu
Masoud Hashemi
Rishabh Maheshwary
Rishabh Maheshwary
31
0
0
04 Nov 2024
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar
Benjamin Muller
Pritish Yuvraj
Rui Hou
Nayan Singhal
Hongjiang Lv
Bing-Quan Liu
KELM
LRM
MoMe
16
2
0
02 Oct 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
23
3
0
26 Sep 2024
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Nikhil Sharma
Kenton Murray
Ziang Xiao
34
1
0
07 Jul 2024
A Principled Framework for Evaluating on Typologically Diverse Languages
A Principled Framework for Evaluating on Typologically Diverse Languages
Esther Ploeger
Wessel Poelman
Andreas Holck Høeg-Petersen
Anders Schlichtkrull
Miryam de Lhoneux
Johannes Bjerva
23
1
0
06 Jul 2024
Understanding and Mitigating Language Confusion in LLMs
Understanding and Mitigating Language Confusion in LLMs
Kelly Marchisio
Wei-Yin Ko
Alexandre Berard
Théo Dehaze
Sebastian Ruder
39
23
0
28 Jun 2024
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
Rishabh Maheshwary
Vikas Yadav
Hoang Nguyen
Khyati Mahajan
Sathwik Tejaswi Madhusudhan
27
3
0
24 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
52
9
0
14 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
75
28
0
09 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
44
6
0
05 Jun 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
38
5
0
11 Apr 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
124
349
0
01 Feb 2024
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
212
103
0
27 Oct 2022
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End
  Question Answering
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
Priyanka Sen
Alham Fikri Aji
Amir Saffari
LRM
97
42
0
04 Oct 2022
SynthBio: A Case Study in Human-AI Collaborative Curation of Text
  Datasets
SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Ann Yuan
Daphne Ippolito
Vitaly Nikolaev
Chris Callison-Burch
Andy Coenen
Sebastian Gehrmann
SyDa
104
17
0
11 Nov 2021
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
91
92
0
06 Aug 2021
Larger-Scale Transformers for Multilingual Masked Language Modeling
Larger-Scale Transformers for Multilingual Masked Language Modeling
Naman Goyal
Jingfei Du
Myle Ott
Giridhar Anantharaman
Alexis Conneau
88
125
0
02 May 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in
  NLP
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
199
167
0
18 Apr 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
239
489
0
16 Oct 2019
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
167
3,357
0
10 Jun 2015
1