ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.00899
  4. Cited By
YODAS: Youtube-Oriented Dataset for Audio and Speech

YODAS: Youtube-Oriented Dataset for Audio and Speech

2 June 2024
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "YODAS: Youtube-Oriented Dataset for Audio and Speech"

28 / 28 papers shown
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
Chaoqun Liu
Mahani Aljunied
Guizhen Chen
Hou Pong Chan
Weiwen Xu
Yu Rong
Wenxuan Zhang
AuLLM
329
3
0
03 Nov 2025
NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion
NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion
Zongyang Du
Shreeram Suresh Chandra
Ismail Rasim Ulgen
Aurosweta Mahapatra
Ali N. Salman
Carlos Busso
Berrak Sisman
146
0
0
31 Oct 2025
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana
Pittawat Taveekitworachai
Warit Sirichotedumrong
Potsawee Manakul
Kunat Pipatanakul
AuLLM
155
0
0
17 Oct 2025
Thai Semantic End-of-Turn Detection for Real-Time Voice Agents
Thai Semantic End-of-Turn Detection for Real-Time Voice Agents
Thanapol Popit
Natthapath Rungseesiripak
Monthol Charattrakool
Saksorn Ruangtanusak
97
0
0
05 Oct 2025
EuroSpeech: A Multilingual Speech Corpus
EuroSpeech: A Multilingual Speech Corpus
Samuel Pfisterer
Florian Grötschla
Luca A. Lanzendörfer
Florian Yan
Roger Wattenhofer
143
0
0
01 Oct 2025
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Yuhan Song
Linhao Zhang
Chuhan Wu
Aiwei Liu
Wei Jia
Houfeng Wang
Xiao-bin Zhou
132
0
0
26 Sep 2025
WolBanking77: Wolof Banking Speech Intent Classification Dataset
WolBanking77: Wolof Banking Speech Intent Classification Dataset
Abdou Karim Kandji
Frédéric Precioso
Cheikh Ba
Samba Ndiaye
Augustin Ndione
217
0
0
23 Sep 2025
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
Liuyuan Jiang
Xiaodong Cui
Brian Kingsbury
Tianyi Chen
Lisha Chen
SSL
133
0
0
18 Sep 2025
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
Brian Yan
Injy Hamed
Shuichiro Shimizu
Vasista Lodagala
William Chen
...
Samuele Cornell
Eunjung Yeo
Kwanghee Choi
Carlos Carvalho
Karen Rosero
144
4
0
17 Sep 2025
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
Huong Ngo
Matt Deitke
Martijn Bartelds
Sarah M Pratt
Josh Gardner
Matt Jordan
Ludwig Schmidt
151
2
0
28 Aug 2025
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
Carlos Carvalho
Francisco Teixeira
Catarina Botelho
Anna Pompili
Rubén Solera-Ureña
...
T. Rolland
John Mendonça
Diogo Pereira
Isabel Trancoso
A. Abad
123
0
0
27 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
422
15
0
06 Aug 2025
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
William Ravenscroft
George Close
Kit Bower-Morris
Jamie Stacey
Dmitry Sityaev
Kris Y. Hong
211
1
0
29 Jul 2025
GLAP: General contrastive audio-text pretraining across domains and languages
GLAP: General contrastive audio-text pretraining across domains and languages
Heinrich Dinkel
Zhiyong Yan
Tianzi Wang
Yongqing Wang
Xingwei Sun
Yadong Niu
Jizhong Liu
Gang Li
Junbo Zhang
Jian Luan
CLIPVLM
217
5
0
12 Jun 2025
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
Jesujoba Oluwadara Alabi
Michael A. Hedderich
David Ifeoluwa Adelani
Dietrich Klakow
483
7
0
27 May 2025
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Titouan Parcollet
Yuan Tseng
Shucong Zhang
Rogier van Dalen
144
3
0
27 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
341
0
0
23 May 2025
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Nithin Rao Koluguri
Monica Sekoyan
George Zelenfroynd
Sasha Meister
Shuoyang Ding
...
Yifan Peng
Sara Papi
Marco Gaido
Alessio Brutti
Boris Ginsburg
250
6
0
19 May 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
306
8
0
26 Mar 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Longji Xu
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
365
14
0
27 Jan 2025
Distilling an End-to-End Voice Assistant Without Instruction Training
  Data
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
327
27
0
03 Oct 2024
FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
FruitsMusic: A Real-World Corpus of Japanese Idol-Group SongsInternational Society for Music Information Retrieval Conference (ISMIR), 2024
Hitoshi Suda
Shunsuke Yoshida
Tomohiko Nakamura
Satoru Fukayama
Jun Ogata
178
3
0
19 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
386
12
0
17 Sep 2024
Enhancing Large Language Model-based Speech Recognition by
  Contextualization for Rare and Ambiguous Words
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
207
2
0
15 Aug 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
346
66
0
20 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
331
44
0
30 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
515
34
0
17 Jun 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLMOSLM
310
90
0
30 Jan 2024
1