Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1510.08484
Cited By
MUSAN: A Music, Speech, and Noise Corpus
28 October 2015
David Snyder
Guoguo Chen
Daniel Povey
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MUSAN: A Music, Speech, and Noise Corpus"
50 / 664 papers shown
GraFPrint: A GNN-Based Approach for Audio Identification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Aditya Bhattacharjee
Shubhr Singh
Emmanouil Benetos
253
5
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
434
9
0
09 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
236
10
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Di Liang
Xiaofei Li
341
2
0
09 Oct 2024
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
318
1
0
07 Oct 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
213
4
0
28 Sep 2024
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ruoyu Wang
Shutong Niu
Gaobin Yang
Jun Du
Shuangqing Qian
Tian Gao
Jia Pan
298
4
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
471
3
0
25 Sep 2024
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Fengrun Zhang
Wangjin Zhou
Yiming Liu
Wang Geng
Yahui Shan
Chen Zhang
213
0
0
24 Sep 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Interspeech (Interspeech), 2024
Shuai Wang
Ke Zhang
Shaoxiong Lin
Junjie Li
Xuefei Wang
Meng Ge
Jianwei Yu
Yanmin Qian
Haizhou Li
190
20
0
24 Sep 2024
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Shuai Wang
Pengcheng Zhu
Haizhou Li
177
0
0
24 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
Junyi Peng
Ladislav Mošner
Lin Zhang
Oldrich Plchot
Themos Stafylakis
Lukáš Burget
Jan Černocký
165
4
0
23 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiaoyu Bie
Xubo Liu
Gaël Richard
233
13
0
17 Sep 2024
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Li-Wei Chen
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
VLM
SSL
235
2
0
16 Sep 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge
Spoken Language Technology Workshop (SLT), 2024
Shuiyun Liu
Yuxiang Kong
Pengcheng Guo
Weiji Zhuang
Peng Gao
Yujun Wang
Lei Xie
295
1
0
16 Sep 2024
Speaker Contrastive Learning for Source Speaker Tracing
Spoken Language Technology Workshop (SLT), 2024
Qing Wang
Hongmei Guo
Jian Kang
Mengjie Du
Jie Li
Xiao-Lei Zhang
Lei Xie
288
1
0
16 Sep 2024
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Spoken Language Technology Workshop (SLT), 2024
Junjie Li
Ke Zhang
Shuai Wang
Haizhou Li
Man-Wai Mak
Kong Aik Lee
143
9
0
15 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
279
2
0
13 Sep 2024
Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better
Mengying Ge
Mingyang Li
Dongkai Tang
Pengbo Li
Kuo Liu
Shuhao Deng
Songbai Pu
Liu Liu
Yang Song
Tao Zhang
226
7
0
12 Sep 2024
Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
Spoken Language Technology Workshop (SLT), 2024
Chang Zeng
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
AAML
189
2
0
10 Sep 2024
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Spoken Language Technology Workshop (SLT), 2024
Hongfei Xue
Rong Gong
Mingchen Shao
Xin Xu
L. xilinx Wang
...
Yong Qin
Jun Du
Ming Li
Binbin Zhang
Bin Jia
182
5
0
09 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
309
11
0
03 Sep 2024
USTC-KXDIGIT System Description for ASVspoof5 Challenge
Yihao Chen
Haochen Wu
Nan Jiang
Xiang Xia
Qing Gu
...
Sian Fang
Yan Song
Wu Guo
Lin Liu
Minqiang Xu
214
6
0
03 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Spoken Language Technology Workshop (SLT), 2024
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
226
4
0
02 Sep 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
273
18
0
27 Aug 2024
A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection
Biometrics and Electronic Signatures (BES), 2024
Xuechen Liu
Xin Wang
Junichi Yamagishi
169
1
0
26 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
320
4
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
180
8
0
20 Aug 2024
Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
Yuankun Xie
Xiaopeng Wang
Zhiyong Wang
Ruibo Fu
Zhengqi Wen
Haonan Cheng
Long Ye
193
2
0
13 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
266
13
0
09 Aug 2024
Language Model Can Listen While Speaking
AAAI Conference on Artificial Intelligence (AAAI), 2024
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Longji Xu
Xie Chen
AuLLM
259
47
0
05 Aug 2024
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Li Zhang
Ning Jiang
Qing Wang
Yuehong Li
Quan Lu
Lei Xie
229
16
0
14 Jul 2024
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
302
11
0
08 Jul 2024
WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System
Yang Xiao
Rohan Kumar Das
222
14
0
04 Jul 2024
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Sungnyun Kim
Kangwook Jang
Sangmin Bae
Hoirin Kim
Se-Young Yun
238
6
0
04 Jul 2024
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
Hui Yan
Zhenchun Lei
Changhong Liu
Yong Zhou
166
2
0
03 Jul 2024
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
Juan Ignacio Alvarez-Trejos
Beltrán Labrador
Alicia Lozano-Diez
350
2
0
01 Jul 2024
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
Oliver Schrufer
M. Milling
Felix Burkhardt
F. Eyben
Björn Schuller
190
5
0
01 Jul 2024
FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels
Yang Xiao
Han Yin
Jisheng Bai
Rohan Kumar Das
215
7
0
29 Jun 2024
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization
Xiang Li
Vivek Govindan
Rohit Paturi
S. Srinivasan
152
1
0
26 Jun 2024
A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Van Tung Pham
Yist Y. Lin
Tao Han
Wei Li
Jun Zhang
Lu Lu
Yuxuan Wang
AuLLM
163
2
0
25 Jun 2024
Disentangled Representation Learning for Environment-agnostic Speaker Recognition
KiHyun Nam
Hee-Soo Heo
Jee-weon Jung
Joon Son Chung
231
2
0
20 Jun 2024
CEC: A Noisy Label Detection Method for Speaker Recognition
Interspeech (Interspeech), 2024
Yao Shen
Yingying Gao
Yaqian Hao
Chenguang Hu
Fulin Zhang
Junlan Feng
Shilei Zhang
NoLa
133
0
0
19 Jun 2024
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Yafeng Chen
Siqi Zheng
Hui Wang
Luyao Cheng
Qian Chen
Shiliang Zhang
Wen Wang
SSL
140
6
0
17 Jun 2024
Robust Channel Learning for Large-Scale Radio Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
208
3
0
16 Jun 2024
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
The Speaker and Language Recognition Workshop (Odyssey), 2024
Federico Costa
Miquel India
Javier Hernando
232
6
0
15 Jun 2024
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
Natarajan Balaji Shankar
Ruchao Fan
Abeer Alwan
245
1
0
15 Jun 2024
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
Martina Valente
Fabio Brugnara
Giovanni Morrone
Enrico Zovato
Leonardo Badino
173
2
0
13 Jun 2024
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
Samuele Cornell
Janek Ebbers
Constance Douwes
Irene Martín-Morató
Manu Harju
A. Mesaros
Romain Serizel
192
23
0
12 Jun 2024
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Satyam Kumar
Sai Srujana Buddi
U. Sarawgi
Vineet Garg
Shivesh Ranjan
Ognjen
Rudovic
Ahmed Hussen Abdelaziz
Saurabh N. Adya
200
5
0
12 Jun 2024
Previous
1
2
3
4
5
6
...
12
13
14
Next