ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.08484
  4. Cited By
MUSAN: A Music, Speech, and Noise Corpus

MUSAN: A Music, Speech, and Noise Corpus

28 October 2015
David Snyder
Guoguo Chen
Daniel Povey
ArXiv (abs)PDFHTML

Papers citing "MUSAN: A Music, Speech, and Noise Corpus"

50 / 664 papers shown
GraFPrint: A GNN-Based Approach for Audio Identification
GraFPrint: A GNN-Based Approach for Audio IdentificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Aditya Bhattacharjee
Shubhr Singh
Emmanouil Benetos
253
5
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation PlanIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
434
9
0
09 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
Mamba-based Segmentation Model for Speaker DiarizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
236
10
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor ExtractionIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Di Liang
Xiaofei Li
341
2
0
09 Oct 2024
Improving Speaker Representations Using Contrastive Losses on
  Multi-scale Features
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
318
1
0
07 Oct 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for
  Neural Codec Language Models
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
213
4
0
28 Sep 2024
Incorporating Spatial Cues in Modular Speaker Diarization for
  Multi-channel Multi-party Meetings
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party MeetingsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ruoyu Wang
Shutong Niu
Gaobin Yang
Jun Du
Shuangqing Qian
Tian Gao
Jia Pan
298
4
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio EventsIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
471
3
0
25 Sep 2024
Disentangling Age and Identity with a Mutual Information Minimization
  Approach for Cross-Age Speaker Verification
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Fengrun Zhang
Wangjin Zhou
Yiming Liu
Wang Geng
Yahui Shan
Chen Zhang
213
0
0
24 Sep 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target
  Speaker Extraction
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionInterspeech (Interspeech), 2024
Shuai Wang
Ke Zhang
Shaoxiong Lin
Junjie Li
Xuefei Wang
Meng Ge
Jianwei Yu
Yanmin Qian
Haizhou Li
190
20
0
24 Sep 2024
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Shuai Wang
Pengcheng Zhu
Haizhou Li
177
0
0
24 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for
  SSL-Based Speaker Verification
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
Junyi Peng
Ladislav Mošner
Lin Zhang
Oldrich Plchot
Themos Stafylakis
Lukáš Burget
Jan Černocký
165
4
0
23 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
Learning Source Disentanglement in Neural Audio CodecIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiaoyu Bie
Xubo Liu
Gaël Richard
233
13
0
17 Sep 2024
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-LabelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Li-Wei Chen
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
VLMSSL
235
2
0
16 Sep 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for
  SLT 2024 LRDWWS Challenge
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS ChallengeSpoken Language Technology Workshop (SLT), 2024
Shuiyun Liu
Yuxiang Kong
Pengcheng Guo
Weiji Zhuang
Peng Gao
Yujun Wang
Lei Xie
295
1
0
16 Sep 2024
Speaker Contrastive Learning for Source Speaker Tracing
Speaker Contrastive Learning for Source Speaker TracingSpoken Language Technology Workshop (SLT), 2024
Qing Wang
Hongmei Guo
Jian Kang
Mengjie Du
Jie Li
Xiao-Lei Zhang
Lei Xie
288
1
0
16 Sep 2024
On the effectiveness of enrollment speech augmentation for Target
  Speaker Extraction
On the effectiveness of enrollment speech augmentation for Target Speaker ExtractionSpoken Language Technology Workshop (SLT), 2024
Junjie Li
Ke Zhang
Shuai Wang
Haizhou Li
Man-Wai Mak
Kong Aik Lee
143
9
0
15 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
279
2
0
13 Sep 2024
Early Joint Learning of Emotion Information Makes MultiModal Model
  Understand You Better
Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better
Mengying Ge
Mingyang Li
Dongkai Tang
Pengbo Li
Kuo Liu
Shuhao Deng
Songbai Pu
Liu Liu
Yang Song
Tao Zhang
226
7
0
12 Sep 2024
Spoofing-Aware Speaker Verification Robust Against Domain and Channel
  Mismatches
Spoofing-Aware Speaker Verification Robust Against Domain and Channel MismatchesSpoken Language Technology Workshop (SLT), 2024
Chang Zeng
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
AAML
189
2
0
10 Sep 2024
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic
  Speech Recognition Challenge
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition ChallengeSpoken Language Technology Workshop (SLT), 2024
Hongfei Xue
Rong Gong
Mingchen Shao
Xin Xu
L. xilinx Wang
...
Yong Qin
Jun Du
Ming Li
Binbin Zhang
Bin Jia
182
5
0
09 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
309
11
0
03 Sep 2024
USTC-KXDIGIT System Description for ASVspoof5 Challenge
USTC-KXDIGIT System Description for ASVspoof5 Challenge
Yihao Chen
Haochen Wu
Nan Jiang
Xiang Xia
Qing Gu
...
Sian Fang
Yan Song
Wu Guo
Lin Liu
Minqiang Xu
214
6
0
03 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for
  Multi-Speaker ASR
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASRSpoken Language Technology Workshop (SLT), 2024
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
226
4
0
02 Sep 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
The VoxCeleb Speaker Recognition Challenge: A RetrospectiveIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
273
18
0
27 Aug 2024
A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing
  Detection
A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing DetectionBiometrics and Electronic Signatures (BES), 2024
Xuechen Liu
Xin Wang
Junichi Yamagishi
169
1
0
26 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSLAI4TS
320
4
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
180
8
0
20 Aug 2024
Temporal Variability and Multi-Viewed Self-Supervised Representations to
  Tackle the ASVspoof5 Deepfake Challenge
Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
Yuankun Xie
Xiaopeng Wang
Zhiyong Wang
Ruibo Fu
Zhengqi Wen
Haonan Cheng
Long Ye
193
2
0
13 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
266
13
0
09 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While SpeakingAAAI Conference on Artificial Intelligence (AAAI), 2024
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Longji Xu
Xie Chen
AuLLM
259
47
0
05 Aug 2024
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Li Zhang
Ning Jiang
Qing Wang
Yuehong Li
Quan Lu
Lei Xie
229
16
0
14 Jul 2024
A Benchmark for Multi-speaker Anonymization
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
302
11
0
08 Jul 2024
WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound
  Event Detection System
WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System
Yang Xiao
Rohan Kumar Das
222
14
0
04 Jul 2024
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust
  Audio-Visual Speech Recognition
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Sungnyun Kim
Kangwook Jang
Sangmin Bae
Hoirin Kim
Se-Young Yun
238
6
0
04 Jul 2024
GMM-ResNext: Combining Generative and Discriminative Models for Speaker
  Verification
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
Hui Yan
Zhenchun Lei
Changhong Liu
Yong Zhou
166
2
0
03 Jul 2024
Leveraging Speaker Embeddings in End-to-End Neural Diarization for
  Two-Speaker Scenarios
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
Juan Ignacio Alvarez-Trejos
Beltrán Labrador
Alicia Lozano-Diez
350
2
0
01 Jul 2024
Are you sure? Analysing Uncertainty Quantification Approaches for
  Real-world Speech Emotion Recognition
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
Oliver Schrufer
M. Milling
Felix Burkhardt
F. Eyben
Björn Schuller
190
5
0
01 Jul 2024
FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with
  Heterogeneous Training Dataset and Potentially Missing Labels
FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels
Yang Xiao
Han Yin
Jisheng Bai
Rohan Kumar Das
215
7
0
29 Jun 2024
Speakers Unembedded: Embedding-free Approach to Long-form Neural
  Diarization
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization
Xiang Li
Vivek Govindan
Rohit Paturi
S. Srinivasan
152
1
0
26 Jun 2024
A Comprehensive Solution to Connect Speech Encoder and Large Language
  Model for ASR
A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Van Tung Pham
Yist Y. Lin
Tao Han
Wei Li
Jun Zhang
Lu Lu
Yuxuan Wang
AuLLM
163
2
0
25 Jun 2024
Disentangled Representation Learning for Environment-agnostic Speaker
  Recognition
Disentangled Representation Learning for Environment-agnostic Speaker Recognition
KiHyun Nam
Hee-Soo Heo
Jee-weon Jung
Joon Son Chung
231
2
0
20 Jun 2024
CEC: A Noisy Label Detection Method for Speaker Recognition
CEC: A Noisy Label Detection Method for Speaker RecognitionInterspeech (Interspeech), 2024
Yao Shen
Yingying Gao
Yaqian Hao
Chenguang Hu
Fulin Zhang
Junlan Feng
Shilei Zhang
NoLa
133
0
0
19 Jun 2024
Self-Distillation Prototypes Network: Learning Robust Speaker
  Representations without Supervision
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Yafeng Chen
Siqi Zheng
Hui Wang
Luyao Cheng
Qian Chen
Shiliang Zhang
Wen Wang
SSL
140
6
0
17 Jun 2024
Robust Channel Learning for Large-Scale Radio Speaker Verification
Robust Channel Learning for Large-Scale Radio Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
208
3
0
16 Jun 2024
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech
  Emotion Recognition Challenge
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition ChallengeThe Speaker and Language Recognition Workshop (Odyssey), 2024
Federico Costa
Miquel India
Javier Hernando
232
6
0
15 Jun 2024
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation
  for Low Resource ASR
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
Natarajan Balaji Shankar
Ruchao Fan
Abeer Alwan
245
1
0
15 Jun 2024
Exploring Spoken Language Identification Strategies for Automatic
  Transcription of Multilingual Broadcast and Institutional Speech
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
Martina Valente
Fabio Brugnara
Giovanni Morrone
Enrico Zovato
Leonardo Badino
173
2
0
13 Jun 2024
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and
  Missing Labels
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
Samuele Cornell
Janek Ebbers
Constance Douwes
Irene Martín-Morató
Manu Harju
A. Mesaros
Romain Serizel
192
23
0
12 Jun 2024
Comparative Analysis of Personalized Voice Activity Detection Systems:
  Assessing Real-World Effectiveness
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Satyam Kumar
Sai Srujana Buddi
U. Sarawgi
Vineet Garg
Shivesh Ranjan
Ognjen
Rudovic
Ahmed Hussen Abdelaziz
Saurabh N. Adya
200
5
0
12 Jun 2024
Previous
123456...121314
Next