Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.09409
Cited By
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge
19 May 2020
Benjamin van Niekerk
Leanne Nortje
Herman Kamper
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge"
50 / 71 papers shown
Title
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
59
0
0
11 Apr 2025
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass
Abrit Pal Singh
Srihari J
Sheetal Kalyani
VLM
31
0
0
24 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
35
0
0
17 Sep 2024
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
43
0
0
09 Sep 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
47
0
0
20 Mar 2024
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
39
8
0
16 Oct 2023
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
Zheng-Yan Sheng
Yang Ai
Yan-Nian Chen
Zhenhua Ling
CVBM
19
4
0
18 Sep 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
27
21
0
02 Aug 2023
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
32
2
0
14 Jul 2023
Rhythm Modeling for Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Herman Kamper
40
5
0
12 Jul 2023
Visually grounded few-shot word learning in low-resource settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
23
4
0
20 Jun 2023
Privacy in Speech Technology
Tomas Bäckström
32
4
0
09 May 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
24
18
0
29 Dec 2022
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models
Sung-Lin Yeh
Hao Tang
SSL
BDL
35
1
0
29 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
34
30
0
27 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
34
2
0
23 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
36
33
0
16 Oct 2022
Towards visually prompted keyword localisation for zero-resource spoken languages
Leanne Nortje
Herman Kamper
29
6
0
12 Oct 2022
Non-Parallel Voice Conversion for ASR Augmentation
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
26
2
0
15 Sep 2022
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions
Yeonjong Choi
Chao Xie
T. Toda
DiffM
38
2
0
30 Jun 2022
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery
W. V. D. Merwe
Herman Kamper
J. D. Preez
22
2
0
23 Jun 2022
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE
Marc-Antoine Georges
J. Schwartz
Thomas Hueber
SSL
19
5
0
17 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
354
0
21 May 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
37
8
0
19 May 2022
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
Yuhta Takida
Takashi Shibuya
Wei-Hsiang Liao
Chieh-Hsin Lai
Junki Ohmura
Toshimitsu Uesaka
Naoki Murata
Shusuke Takahashi
Toshiyuki Kumakura
Yuki Mitsufuji
BDL
26
61
0
16 May 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Sung-Lin Yeh
Hao Tang
SSL
27
6
0
29 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Gašper Beguš
Alan Zhou
SSL
27
5
0
22 Mar 2022
Modelling word learning and recognition using visually grounded speech
Danny Merkx
Sebastiaan Scholten
S. Frank
M. Ernestus
O. Scharenborg
SSL
37
0
0
14 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring
Herman Kamper
34
25
0
24 Feb 2022
AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
17
54
0
21 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Dan Su
Xunying Liu
Dong Yu
Helen Meng
23
11
0
18 Feb 2022
Robust Vector Quantized-Variational Autoencoder
Chieh-Hsin Lai
Dongmian Zou
Gilad Lerman
DRL
32
5
0
04 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues
Akira Taniguchi
Hiroaki Murakami
Ryo Ozaki
T. Taniguchi
23
2
0
18 Jan 2022
Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Alex F. McKinney
Benjamin Cauchi
20
3
0
24 Nov 2021
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
T. Toda
26
9
0
13 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
27
111
0
03 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
29
11
0
27 Oct 2021
Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Gašper Beguš
Alan Zhou
FAtt
SSL
33
5
0
05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
55
22
0
05 Oct 2021
Noisy-to-Noisy Voice Conversion Framework with Denoising Model
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
T. Toda
23
7
0
22 Sep 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
Zhan Zhang
Yuehai Wang
Jianyi Yang
30
3
0
12 Aug 2021
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
Benjamin van Niekerk
Leanne Nortje
Matthew Baas
Herman Kamper
SSL
38
31
0
02 Aug 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
35
20
0
08 Jul 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
22
136
0
18 Jun 2021
Unsupervised Automatic Speech Recognition: A Review
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
39
57
0
09 Jun 2021
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
21
37
0
03 Jun 2021
Unsupervised Speech Recognition
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
28
271
0
24 May 2021
Discrete representations in neural models of spoken language
Bertrand Higy
Lieke Gelderloos
A. Alishahi
Grzegorz Chrupała
21
6
0
12 May 2021
VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding
J. Nistal
Cyran Aouameur
Stefan Lattner
G. Richard
25
7
0
04 May 2021
1
2
Next