Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1510.08484
Cited By
MUSAN: A Music, Speech, and Noise Corpus
28 October 2015
David Snyder
Guoguo Chen
Daniel Povey
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MUSAN: A Music, Speech, and Noise Corpus"
50 / 664 papers shown
Title
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
Zhenyu Zhou
Shibiao Xu
Shi Yin
Lantian Li
D. Wang
137
5
0
11 Jun 2024
MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms
Seung-bin Kim
Chan-yeong Lim
Jungwoo Heo
Ju-ho Kim
Hyun-Seo Shin
Kyo-Won Koo
Ha-Jin Yu
273
3
0
11 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
315
5
0
09 Jun 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Interspeech (Interspeech), 2024
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
Zirun Zhu
...
Yufei Xia
Jinzhu Li
Sheng Zhao
Jinyu Li
Naoyuki Kanda
142
6
0
09 Jun 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
Interspeech (Interspeech), 2024
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
273
4
0
08 Jun 2024
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Bei Liu
Haoyu Wang
Yanmin Qian
MQ
384
3
0
08 Jun 2024
To what extent can ASV systems naturally defend against spoofing attacks?
Interspeech (Interspeech), 2024
Jee-weon Jung
Xin Eric Wang
Nicholas W. D. Evans
Shinji Watanabe
Hye-jin Shim
Hemlata Tak
Sidhhant Arora
Junichi Yamagishi
Joon Son Chung
AAML
203
10
0
08 Jun 2024
Relational Proxy Loss for Audio-Text based Keyword Spotting
Interspeech (Interspeech), 2024
Youngmoon Jung
Seungjin Lee
Joon-Young Yang
Jaeyoung Roh
Chang Woo Han
Hoon-Young Cho
156
3
0
08 Jun 2024
Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy
Yuankun Xie
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
Xiaopeng Wang
Haonnan Cheng
Long Ye
Jianhua Tao
333
12
0
05 Jun 2024
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Victor Miara
Theo Lepage
Reda Dehak
221
7
0
04 Jun 2024
Mamba in Speech: Towards an Alternative to Self-Attention
Xiangyu Zhang
Qiquan Zhang
Hexin Liu
Tianyi Xiao
Xinyuan Qian
Beena Ahmed
E. Ambikairajah
Haizhou Li
Julien Epps
Mamba
378
91
0
21 May 2024
Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
Nian Li
Jianguo Wei
ViT
246
0
0
20 May 2024
Robust Singing Voice Transcription Serves Synthesis
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ruiqi Li
Yu Zhang
Yongqi Wang
Zhiqing Hong
Rongjie Huang
Zhou Zhao
308
16
0
16 May 2024
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
The Speaker and Language Recognition Workshop (Odyssey), 2024
Jenthe Thienpondt
Kris Demuynck
180
3
0
15 May 2024
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao
Xinyuan Qian
Yidi Jiang
Junjie Li
Jiadong Wang
Haizhou Li
319
2
0
29 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Zhuofan Wen
Siyuan Zhang
...
Yinan Han
Xiaoshi Zhong
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
366
33
0
26 Apr 2024
TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Chengxin Chen
Pengyuan Zhang
205
4
0
19 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
272
55
0
15 Apr 2024
What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions
Interspeech (Interspeech), 2023
Hanyu Meng
V. Sethu
E. Ambikairajah
230
3
0
10 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
280
58
0
03 Apr 2024
Maximum Discrepancy Generative Regularization and Non-Negative Matrix Factorization for Single Channel Source Separation
Martin Ludvigsen
M. Grasmair
142
0
0
26 Mar 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HyoJung Han
Mohamed Anwar
J. Pino
Wei-Ning Hsu
Marine Carpuat
Bowen Shi
Changhan Wang
VLM
246
15
0
21 Mar 2024
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
PeiYing Lee
HauYun Guo
Berlin Chen
183
0
0
21 Mar 2024
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
256
1
0
13 Mar 2024
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
International Conference on Learning Representations (ICLR), 2024
Muhammad A. Shah
David Solans Noguero
Mikko A. Heikkilä
Nicolas Kourtellis
228
12
0
08 Mar 2024
Exploration of Adapter for Noise Robust Automatic Speech Recognition
Hao Shi
Tatsuya Kawahara
258
6
0
28 Feb 2024
ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
123
10
0
23 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
223
32
0
08 Feb 2024
Adversarial Data Augmentation for Robust Speaker Verification
Zhenyu Zhou
Junhui Chen
Namin Wang
Lantian Li
Dong Wang
212
6
0
05 Feb 2024
Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Haesun Joung
Kyogu Lee
174
1
0
27 Jan 2024
Adversarial speech for voice privacy protection from Personalized Speech generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Shihao Chen
Liping Chen
Jie Zhang
KongAik Lee
Zhenhua Ling
Lirong Dai
AAML
209
10
0
22 Jan 2024
An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
Qiquan Zhang
Meng Ge
Hongxu Zhu
E. Ambikairajah
Qi Song
Zhaoheng Ni
Haizhou Li
233
15
0
18 Jan 2024
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Nicolas Müller
Piotr Kawa
Wei Herng Choong
Edresson Casanova
Eren Golge
Thorsten Muller
P. Syga
Philip Sperl
Konstantin Böttinger
376
97
0
17 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Kenichi Fujita
Hiroshi Sato
Takanori Ashihara
Hiroki Kanagawa
Marc Delcroix
Takafumi Moriya
Yusuke Ijima
125
14
0
10 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Yinan Han
Jianhua Tao
398
26
0
07 Jan 2024
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
He Wang
Pengcheng Guo
Pan Zhou
Lei Xie
379
17
0
07 Jan 2024
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
Yi Ma
Kong Aik Lee
Ville Hautamaki
Meng Ge
Haizhou Li
134
1
0
05 Jan 2024
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Danwei Cai
Zexin Cai
Ze Li
Ming Li
284
2
0
03 Jan 2024
Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models
Christopher Simic
Tobias Bocklet
221
10
0
21 Dec 2023
Noise robust distillation of self-supervised speech models via correlation metrics
Fabian Ritter-Gutierrez
Kuan-Po Huang
Dianwen Ng
Jeremy H.M Wong
Hung-yi Lee
Chng Eng Siong
Nancy F. Chen
284
4
0
19 Dec 2023
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Hyunjun Heo
U.H Shin
Ran Lee
YoungJu Cheon
Hyung-Min Park
215
26
0
14 Dec 2023
Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning
Automatic Speech Recognition & Understanding (ASRU), 2023
Ivan Fung
Lahiru Samarakoon
Samuel J. Broughton
OOD
252
2
0
12 Dec 2023
Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
Anna Derington
H. Wierstorf
Ali Özkil
F. Eyben
Felix Burkhardt
Björn W. Schuller
354
2
0
11 Dec 2023
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini
Mireia Díez
Themos Stafylakis
Lukávs Burget
316
20
0
07 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Computer Vision and Pattern Recognition (CVPR), 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
359
16
0
05 Dec 2023
Phonetic-aware speaker embedding for far-field speaker verification
Zezhong Jin
Youzhi Tu
Man-Wai Mak
196
2
0
27 Nov 2023
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel
Shreyas Ramoji
Somil Jain
Pratik Roy Chowdhuri
Prachi Singh
Deepu Vijayasenan
Sriram Ganapathy
174
10
0
21 Nov 2023
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Vikentii Pankov
Valeria Pronina
Alexander Kuzmin
Maksim Borisov
Nikita Usoltsev
Xingshan Zeng
Alexander Golubkov
Nikolai Ermolenko
Aleksandra Shirshova
Yulia Matveeva
159
6
0
16 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Heng-Jui Chang
James R. Glass
230
7
0
15 Nov 2023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Automatic Speech Recognition & Understanding (ASRU), 2023
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
...
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
244
35
0
27 Oct 2023
Previous
1
2
3
4
5
...
12
13
14
Next