ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.08484
  4. Cited By
MUSAN: A Music, Speech, and Noise Corpus

MUSAN: A Music, Speech, and Noise Corpus

28 October 2015
David Snyder
Guoguo Chen
Daniel Povey
ArXiv (abs)PDFHTML

Papers citing "MUSAN: A Music, Speech, and Noise Corpus"

50 / 664 papers shown
AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
Tuan Nguyen
Huy-Dat Tran
133
0
0
17 Jun 2025
A Comparative Study on Proactive and Passive Detection of Deepfake Speech
A Comparative Study on Proactive and Passive Detection of Deepfake Speech
Chia-Hua Wu
W. Ge
Xin Eric Wang
Junichi Yamagishi
Yu Tsao
H. Wang
AAML
193
3
0
17 Jun 2025
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Jiayi He
Jiangyan Yi
Jianhua Tao
Siding Zeng
Hao Gu
194
2
0
17 Jun 2025
Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models
Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models
Bo Li
C. Xu
Wufeng Zhang
LRM
353
2
0
16 Jun 2025
Mitigating Non-Target Speaker Bias in Guided Speaker Embedding
Mitigating Non-Target Speaker Bias in Guided Speaker Embedding
Shota Horiguchi
Takanori Ashihara
Marc Delcroix
Atsushi Ando
Naohiro Tawara
147
0
0
14 Jun 2025
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
S. Araki
H. Bredin
183
2
0
13 Jun 2025
SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research
SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research
Ahmed Adel Attia
Jing Liu
C. Espy-Wilson
117
0
0
10 Jun 2025
Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
David Palzer
Matthew Maciejewski
Eric Fosler-Lussier
98
3
0
05 Jun 2025
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Badr M. Abdullah
Matthew Baas
Bernd Möbius
Dietrich Klakow
136
1
0
30 May 2025
Visual Cues Support Robust Turn-taking Prediction in Noise
Visual Cues Support Robust Turn-taking Prediction in NoiseInterspeech (Interspeech), 2025
Sam O'Connor Russell
Naomi Harte
227
1
0
28 May 2025
Exploring Generative Error Correction for Dysarthric Speech Recognition
Exploring Generative Error Correction for Dysarthric Speech Recognition
Moreno La Quatra
Alkis Koudounas
Valerio Mario Salerno
Sabato Marco Siniscalchi
173
2
0
26 May 2025
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-SpeechInterspeech (Interspeech), 2025
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
188
1
0
26 May 2025
Learning Emotion-Invariant Speaker Representations for Speaker Verification
Learning Emotion-Invariant Speaker Representations for Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jingguang Tian
Xinhui Hu
Xinkang Xu
291
6
0
24 May 2025
SEED: Speaker Embedding Enhancement Diffusion Model
SEED: Speaker Embedding Enhancement Diffusion Model
KiHyun Nam
Jungwoo Heo
Jee-weon Jung
Gangin Park
Chaeyoung Jung
Ha-Jin Yu
Joon Son Chung
DiffM
231
0
0
22 May 2025
Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting
Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting
Youngmoon Jung
Yong-Hyeok Lee
Myunghun Jung
Jaeyoung Roh
Chang Woo Han
Hoon-Young Cho
343
1
0
22 May 2025
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
Hongfei Xue
Yufeng Tang
Jun Zhang
Xuelong Geng
Lei Xie
269
0
0
22 May 2025
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
274
9
0
21 May 2025
SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
Theo Lepage
Reda Dehak
241
3
0
20 May 2025
Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down
Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down
Yingzhi Wang
Anas Alhmoud
Saad Alsahly
Muhammad Alqurishi
Mirco Ravanelli
229
1
0
19 May 2025
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
SepALM: Audio Language Models Are Error Correctors for Robust Speech SeparationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLMKELMVLM
429
1
0
06 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
1.0K
0
0
06 May 2025
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
Ya Li
Bin Zhou
Bo Hu
870
0
0
06 May 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
220
0
0
23 Apr 2025
Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
Tianchi Liu
Duc-Tuan Truong
Rohan Kumar Das
K. Lee
Haizhou Li
280
18
0
08 Apr 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
331
1
0
11 Mar 2025
A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment
A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment
K. Inoue
Yuki Okafuji
Jun Baba
Yoshiki Ohira
Katsuya Hyodo
Tatsuya Kawahara
174
2
0
08 Mar 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Christopher Simic
Korbinian Riedhammer
Tobias Bocklet
466
1
0
03 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2025
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
530
1
0
03 Feb 2025
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice EnhancementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Junan Zhang
Jing Yang
Zihao Fang
Longji Xu
Zehua Zhang
Zhuo Wang
Fan Fan
Zhikai Wu
DiffM
488
25
0
26 Jan 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
277
2
0
23 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech RepresentationInternational Conference on Learning Representations (ICLR), 2025
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
504
4
0
23 Jan 2025
Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
Investigation of Whisper ASR Hallucinations Induced by Non-Speech AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Mateusz Barański
Jan Jasiński
Julitta Bartolewska
Stanisław Kacprzak
Marcin Witkowski
K. Kowalczyk
167
13
0
20 Jan 2025
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Li Zhang
Jiyao Liu
Lei Xie
271
0
0
15 Jan 2025
Multi-modal Speech Enhancement with Limited Electromyography Channels
Multi-modal Speech Enhancement with Limited Electromyography ChannelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Fuyuan Feng
Longting Xu
R. Das
84
1
0
11 Jan 2025
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech RecognitionInformation Fusion (Inf. Fusion), 2025
Rui Liu
Hongyu Yuan
Hong Li
296
2
0
03 Jan 2025
Guided Speaker Embedding
Guided Speaker EmbeddingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Shota Horiguchi
Takafumi Moriya
Atsushi Ando
Takanori Ashihara
Hiroshi Sato
Naohiro Tawara
Marc Delcroix
326
4
0
03 Jan 2025
VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Hoang Long Vu
Phuong Tuan Dat
Pham Thao Nhi
Nguyen Song Hao
Nguyen Thi Thu Trang
95
2
0
03 Jan 2025
Text-Aware Adapter for Few-Shot Keyword Spotting
Text-Aware Adapter for Few-Shot Keyword SpottingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Youngmoon Jung
Jinyoung Lee
Seungjin Lee
Myunghun Jung
Yong-Hyeok Lee
Hoon-Young Cho
121
3
0
24 Dec 2024
On the Generation and Removal of Speaker Adversarial Perturbation for
  Voice-Privacy Protection
On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy ProtectionSpoken Language Technology Workshop (SLT), 2024
Chenyang Guo
Liping Chen
Zhuhai Li
Kong Aik Lee
Zhen-Hua Ling
Wu Guo
AAML
313
1
0
12 Dec 2024
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for
  Generalized Speech Processing
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech ProcessingNeural Information Processing Systems (NeurIPS), 2024
Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro-Velazquez
Ariya Rastrow
Najim Dehak
Jesus Villalba
313
3
0
05 Dec 2024
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker
  Verification
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Bei Liu
Yanmin Qian
445
0
0
02 Dec 2024
SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio
SONNET: Enhancing Time Delay Estimation by Leveraging Simulated AudioInternational Conference on Pattern Recognition (ICPR), 2024
Erik Tegler
Magnus Oskarsson
Kalle Åström
228
1
0
20 Nov 2024
Transferable Adversarial Attacks against ASR
Transferable Adversarial Attacks against ASRIEEE Signal Processing Letters (SPL), 2024
Xiaoxue Gao
Zexin Li
Yiming Chen
Cong Liu
Haoyang Li
AAML
250
3
0
14 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
Marcelo Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
439
10
0
06 Nov 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDaBDLAuLLMVLM
350
40
0
23 Oct 2024
Prototype and Instance Contrastive Learning for Unsupervised Domain
  Adaptation in Speaker Verification
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker VerificationInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2024
Wen Huang
Bing Han
Zhengyang Chen
Shuai Wang
Yanmin Qian
VLMSSL
208
0
0
22 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity
  Detection using Self-Supervised Learning Features
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
Natsuo Yamashita
Masaaki Yamamoto
Yohei Kawaguchi
236
1
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
359
2
0
17 Oct 2024
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Mao-Kui He
Jun Du
Shu-Tong Niu
Qing-Feng Liu
Chin-Hui Lee
208
2
0
15 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
325
0
0
14 Oct 2024
Previous
12345...121314
Next
Page 2 of 14
Pageof 14