Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.09409
Cited By
v1
v2 (latest)
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge
19 May 2020
Benjamin van Niekerk
Leanne Nortje
Herman Kamper
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge"
50 / 73 papers shown
Towards Audio Token Compression in Large Audio Language Models
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
387
2
0
26 Nov 2025
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
L. Pepino
Pablo Riera
Juan Kamienkowski
Luciana Ferrer
164
0
0
20 Nov 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
582
0
0
11 Apr 2025
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass
Abrit Pal Singh
Srihari J
Sheetal Kalyani
VLM
220
0
0
24 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Spoken Language Technology Workshop (SLT), 2024
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
239
2
0
17 Sep 2024
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
International Conference on Speech Technology and Human-Computer Dialogue (ICSTHD), 2024
Leanne Nortje
Dan Oneaţă
Gabriel Pirlogeanu
Herman Kamper
VLM
353
0
0
09 Sep 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
331
2
0
20 Mar 2024
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
301
15
0
16 Oct 2023
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
ACM Multimedia (ACM MM), 2023
Zheng-Yan Sheng
Yang Ai
Yan-Nian Chen
Zhenhua Ling
CVBM
202
11
0
18 Sep 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Neural Information Processing Systems (NeurIPS), 2023
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
361
40
0
02 Aug 2023
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
191
3
0
14 Jul 2023
Rhythm Modeling for Voice Conversion
IEEE Signal Processing Letters (IEEE SPL), 2023
Benjamin van Niekerk
M. Carbonneau
Herman Kamper
335
9
0
12 Jul 2023
Visually grounded few-shot word learning in low-resource settings
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
249
4
0
20 Jun 2023
Privacy in Speech Technology
Tomas Bäckström
443
11
0
09 May 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Spoken Language Technology Workshop (SLT), 2022
Yinghao Aaron Li
Cong Han
N. Mesgarani
203
23
0
29 Dec 2022
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sung-Lin Yeh
Hao Tang
SSL
BDL
222
1
0
29 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
272
43
0
27 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
276
2
0
23 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Spoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
322
38
0
16 Oct 2022
Towards visually prompted keyword localisation for zero-resource spoken languages
Spoken Language Technology Workshop (SLT), 2022
Leanne Nortje
Herman Kamper
191
6
0
12 Oct 2022
Non-Parallel Voice Conversion for ASR Augmentation
Interspeech (Interspeech), 2022
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
261
3
0
15 Sep 2022
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions
Interspeech (Interspeech), 2022
Yeonjong Choi
Chao Xie
Tomoki Toda
DiffM
207
4
0
30 Jun 2022
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery
Interspeech (Interspeech), 2022
W. V. D. Merwe
Herman Kamper
J. D. Preez
217
3
0
23 Jun 2022
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE
Interspeech (Interspeech), 2022
Marc-Antoine Georges
J. Schwartz
Thomas Hueber
SSL
227
5
0
17 Jun 2022
Self-Supervised Speech Representation Learning: A Review
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
776
471
0
21 May 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Interspeech (Interspeech), 2022
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
284
10
0
19 May 2022
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
International Conference on Machine Learning (ICML), 2022
Yuhta Takida
Takashi Shibuya
Wei-Hsiang Liao
Chieh-Hsin Lai
Junki Ohmura
Toshimitsu Uesaka
Naoki Murata
Shusuke Takahashi
Toshiyuki Kumakura
Yuki Mitsufuji
BDL
293
97
0
16 May 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Interspeech (Interspeech), 2022
Sung-Lin Yeh
Hao Tang
SSL
285
8
0
29 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Interspeech (Interspeech), 2022
Gašper Beguš
Alan Zhou
SSL
419
6
0
22 Mar 2022
Modelling word learning and recognition using visually grounded speech
Danny Merkx
Sebastiaan Scholten
S. Frank
M. Ernestus
O. Scharenborg
SSL
288
0
0
14 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
265
13
0
01 Mar 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Herman Kamper
334
31
0
24 Feb 2022
AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
315
63
0
21 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Disong Wang
Shan Yang
Jane Polak Scowcroft
Xunying Liu
Dong Yu
Helen Meng
200
11
0
18 Feb 2022
Robust Vector Quantized-Variational Autoencoder
Chieh-Hsin Lai
Dongmian Zou
Gilad Lerman
DRL
348
6
0
04 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues
IEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2022
Akira Taniguchi
Hiroaki Murakami
Ryo Ozaki
T. Taniguchi
276
2
0
18 Jan 2022
Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Alex F. McKinney
Benjamin Cauchi
269
3
0
24 Nov 2021
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
Tomoki Toda
151
15
0
13 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
459
162
0
03 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
334
12
0
27 Oct 2021
Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Gašper Beguš
Alan Zhou
FAtt
SSL
271
6
0
05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
392
28
0
05 Oct 2021
Noisy-to-Noisy Voice Conversion Framework with Denoising Model
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
Tomoki Toda
245
11
0
22 Sep 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Zhan Zhang
Yuehai Wang
Jianyi Yang
293
3
0
12 Aug 2021
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
Benjamin van Niekerk
Leanne Nortje
Matthew Baas
Herman Kamper
SSL
307
34
0
02 Aug 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer
Automatic Speech Recognition & Understanding (ASRU), 2021
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
314
25
0
08 Jul 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Interspeech (Interspeech), 2021
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
219
178
0
18 Jun 2021
Unsupervised Automatic Speech Recognition: A Review
Speech Communication (Speech Commun.), 2021
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
186
70
0
09 Jun 2021
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation
Interspeech (Interspeech), 2021
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
226
44
0
03 Jun 2021
Unsupervised Speech Recognition
Neural Information Processing Systems (NeurIPS), 2021
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
475
295
0
24 May 2021
1
2
Next
Page 1 of 2