MUSAN: A Music, Speech, and Noise Corpus

28 October 2015

Papers citing "MUSAN: A Music, Speech, and Noise Corpus"

50 / 664 papers shown

AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR

Tuan Nguyen

Huy-Dat Tran

133

17 Jun 2025

A Comparative Study on Proactive and Passive Detection of Deepfake Speech

193

17 Jun 2025

Manipulated Regions Localization For Partially Deepfake Audio: A Survey

194

17 Jun 2025

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

353

16 Jun 2025

Mitigating Non-Target Speaker Bias in Guided Speaker Embedding

147

14 Jun 2025

Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering

183

13 Jun 2025

SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research

Ahmed Adel Attia

Jing Liu

C. Espy-Wilson

117

10 Jun 2025

Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

David Palzer

Matthew Maciejewski

Eric Fosler-Lussier

05 Jun 2025

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification

136

30 May 2025

Visual Cues Support Robust Turn-taking Prediction in NoiseInterspeech (Interspeech), 2025

Sam O'Connor Russell

Naomi Harte

227

28 May 2025

Exploring Generative Error Correction for Dysarthric Speech Recognition

Moreno La Quatra

Alkis Koudounas

Valerio Mario Salerno

Sabato Marco Siniscalchi

173

26 May 2025

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-SpeechInterspeech (Interspeech), 2025

188

26 May 2025

Learning Emotion-Invariant Speaker Representations for Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Jingguang Tian

Xinhui Hu

Xinkang Xu

291

24 May 2025

SEED: Speaker Embedding Enhancement Diffusion Model

231

22 May 2025

Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting

343

22 May 2025

Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty

269

22 May 2025

VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models

Qunshan Gu

Yanfeng Wang

Yu Wang

AuLLM

274

21 May 2025

SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification

Theo Lepage

Reda Dehak

241

20 May 2025

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

229

19 May 2025

SepALM: Audio Language Models Are Error Correctors for Robust Speech SeparationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

429

06 May 2025

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization

1.0K

06 May 2025

MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification

Ya Li

Bin Zhou

Bo Hu

870

06 May 2025

SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

220

23 Apr 2025

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing

280

08 Apr 2025

An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR

Sewade Ogun

Vincent Colotte

Emmanuel Vincent

331

11 Mar 2025

A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment

174

08 Mar 2025

Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Christopher Simic

Korbinian Riedhammer

Tobias Bocklet

466

03 Feb 2025

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2025

530

03 Feb 2025

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice EnhancementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

488

26 Jan 2025

Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement

277

23 Jan 2025

Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech RepresentationInternational Conference on Learning Representations (ICLR), 2025

504

23 Jan 2025

Investigation of Whisper ASR Hallucinations Induced by Non-Speech AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

167

20 Jan 2025

Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification

Li Zhang

Jiyao Liu

Lei Xie

271

15 Jan 2025

Multi-modal Speech Enhancement with Limited Electromyography ChannelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Fuyuan Feng

Longting Xu

R. Das

11 Jan 2025

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech RecognitionInformation Fusion (Inf. Fusion), 2025

Rui Liu

Hongyu Yuan

Hong Li

296

03 Jan 2025

Guided Speaker EmbeddingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

326

03 Jan 2025

VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

03 Jan 2025

Text-Aware Adapter for Few-Shot Keyword SpottingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

121

24 Dec 2024

On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy ProtectionSpoken Language Technology Workshop (SLT), 2024

313

12 Dec 2024

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech ProcessingNeural Information Processing Systems (NeurIPS), 2024

Yen-Ju Lu

Jing Liu

Thomas Thebaud

Laureano Moro-Velazquez

Ariya Rastrow

Najim Dehak

Jesus Villalba

313

05 Dec 2024

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification

Bei Liu

Yanmin Qian

445

02 Dec 2024

SONNET: Enhancing Time Delay Estimation by Leveraging Simulated AudioInternational Conference on Pattern Recognition (ICPR), 2024

Erik Tegler

Magnus Oskarsson

Kalle Åström

228

20 Nov 2024

Transferable Adversarial Attacks against ASRIEEE Signal Processing Letters (SPL), 2024

250

14 Nov 2024

Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

439

06 Nov 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

...

Zhihao Du

Shiliang Zhang

SyDa BDL AuLLM VLM

350

23 Oct 2024

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker VerificationInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2024

208

22 Oct 2024

End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features

Natsuo Yamashita

Masaaki Yamamoto

Yohei Kawaguchi

236

17 Oct 2024

Sound Check: Auditing Audio Datasets

William Agnew

Harry H. Jiang

Sauvik Das

359

17 Oct 2024

Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization

208

15 Oct 2024

JOOCI: a Framework for Learning Comprehensive Speech Representations

Hemant Yadav

R. Shah

Sunayana Sitaram

325

14 Oct 2024