ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.00899
  4. Cited By
YODAS: Youtube-Oriented Dataset for Audio and Speech

YODAS: Youtube-Oriented Dataset for Audio and Speech

2 June 2024
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "YODAS: Youtube-Oriented Dataset for Audio and Speech"

21 / 21 papers shown
Title
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
Liuyuan Jiang
Xiaodong Cui
Brian Kingsbury
Tianyi Chen
Lisha Chen
SSL
24
0
0
18 Sep 2025
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
Brian Yan
Injy Hamed
Shuichiro Shimizu
Vasista Lodagala
William Chen
...
Samuele Cornell
Eunjung Yeo
Kwanghee Choi
Carlos Carvalho
Karen Rosero
0
0
0
17 Sep 2025
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
Huong Ngo
Matt Deitke
Martijn Bartelds
Sarah M Pratt
Josh Gardner
Matt Jordan
Ludwig Schmidt
7
0
0
28 Aug 2025
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
Carlos Carvalho
Francisco Teixeira
Catarina Botelho
Anna Pompili
Rubén Solera-Ureña
...
T. Rolland
John Mendonça
Diogo Pereira
Isabel Trancoso
A. Abad
16
0
0
27 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
90
3
0
06 Aug 2025
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
William Ravenscroft
George Close
Kit Bower-Morris
Jamie Stacey
Dmitry Sityaev
Kris Y. Hong
49
0
0
29 Jul 2025
GLAP: General contrastive audio-text pretraining across domains and languages
GLAP: General contrastive audio-text pretraining across domains and languages
Heinrich Dinkel
Zhiyong Yan
Tianzi Wang
Yongqing Wang
Xingwei Sun
Yadong Niu
Jizhong Liu
Gang Li
Junbo Zhang
Jian Luan
CLIPVLM
91
3
0
12 Jun 2025
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
Jesujoba Oluwadara Alabi
Michael A. Hedderich
David Ifeoluwa Adelani
Dietrich Klakow
238
2
0
27 May 2025
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Titouan Parcollet
Yuan Tseng
Shucong Zhang
Rogier van Dalen
84
1
0
27 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
205
0
0
23 May 2025
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Nithin Rao Koluguri
Monica Sekoyan
George Zelenfroynd
Sasha Meister
Shuoyang Ding
...
Yifan Peng
Sara Papi
Marco Gaido
Alessio Brutti
Boris Ginsburg
99
1
0
19 May 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
207
5
0
26 Mar 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Longji Xu
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
197
11
0
28 Jan 2025
Distilling an End-to-End Voice Assistant Without Instruction Training
  Data
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
126
20
0
03 Oct 2024
FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
Hitoshi Suda
Shunsuke Yoshida
Tomohiko Nakamura
Satoru Fukayama
Jun Ogata
92
2
0
19 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
224
9
0
17 Sep 2024
Enhancing Large Language Model-based Speech Recognition by
  Contextualization for Rare and Ambiguous Words
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
94
1
0
15 Aug 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
185
51
0
20 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
195
29
0
30 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
218
20
0
17 Jun 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLMOSLM
127
70
0
30 Jan 2024
1