ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.09409
  4. Cited By
Vector-quantized neural networks for acoustic unit discovery in the
  ZeroSpeech 2020 challenge
v1v2 (latest)

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

19 May 2020
Benjamin van Niekerk
Leanne Nortje
Herman Kamper
ArXiv (abs)PDFHTML

Papers citing "Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge"

50 / 73 papers shown
Towards Audio Token Compression in Large Audio Language Models
Towards Audio Token Compression in Large Audio Language Models
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
387
2
0
26 Nov 2025
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
L. Pepino
Pablo Riera
Juan Kamienkowski
Luciana Ferrer
164
0
0
20 Nov 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
582
0
0
11 Apr 2025
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass
Abrit Pal Singh
Srihari J
Sheetal Kalyani
VLM
220
0
0
24 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice
  Conversion
Discrete Unit based Masking for Improving Disentanglement in Voice ConversionSpoken Language Technology Workshop (SLT), 2024
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
239
2
0
17 Sep 2024
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
Improved Visually Prompted Keyword Localisation in Real Low-Resource SettingsInternational Conference on Speech Technology and Human-Computer Dialogue (ICSTHD), 2024
Leanne Nortje
Dan Oneaţă
Gabriel Pirlogeanu
Herman Kamper
VLM
353
0
0
09 Sep 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
331
2
0
20 Mar 2024
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
301
15
0
16 Oct 2023
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice
  Alignment
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice AlignmentACM Multimedia (ACM MM), 2023
Zheng-Yan Sheng
Yang Ai
Yan-Nian Chen
Zhenhua Ling
CVBM
202
11
0
18 Sep 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
From Discrete Tokens to High-Fidelity Audio Using Multi-Band DiffusionNeural Information Processing Systems (NeurIPS), 2023
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
361
40
0
02 Aug 2023
Representation Learning With Hidden Unit Clustering For Low Resource
  Speech Applications
Representation Learning With Hidden Unit Clustering For Low Resource Speech ApplicationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
191
3
0
14 Jul 2023
Rhythm Modeling for Voice Conversion
Rhythm Modeling for Voice ConversionIEEE Signal Processing Letters (IEEE SPL), 2023
Benjamin van Niekerk
M. Carbonneau
Herman Kamper
335
9
0
12 Jul 2023
Visually grounded few-shot word learning in low-resource settings
Visually grounded few-shot word learning in low-resource settingsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
249
4
0
20 Jun 2023
Privacy in Speech Technology
Privacy in Speech Technology
Tomas Bäckström
443
11
0
09 May 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS ModelsSpoken Language Technology Workshop (SLT), 2022
Yinghao Aaron Li
Cong Han
N. Mesgarani
203
23
0
29 Dec 2022
Learning Dependencies of Discrete Speech Representations with Neural
  Hidden Markov Models
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sung-Lin Yeh
Hao Tang
SSLBDL
222
1
0
29 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech ChallengeIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
272
43
0
27 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken
  sentence embeddings
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
276
2
0
23 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
322
38
0
16 Oct 2022
Towards visually prompted keyword localisation for zero-resource spoken
  languages
Towards visually prompted keyword localisation for zero-resource spoken languagesSpoken Language Technology Workshop (SLT), 2022
Leanne Nortje
Herman Kamper
191
6
0
12 Oct 2022
Non-Parallel Voice Conversion for ASR Augmentation
Non-Parallel Voice Conversion for ASR AugmentationInterspeech (Interspeech), 2022
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
261
3
0
15 Sep 2022
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and
  Reverberant Conditions
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant ConditionsInterspeech (Interspeech), 2022
Yeonjong Choi
Chao Xie
Tomoki Toda
DiffM
207
4
0
30 Jun 2022
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised
  Acoustic Unit Discovery
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit DiscoveryInterspeech (Interspeech), 2022
W. V. D. Merwe
Herman Kamper
J. D. Preez
217
3
0
23 Jun 2022
Self-supervised speech unit discovery from articulatory and acoustic
  features using VQ-VAE
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAEInterspeech (Interspeech), 2022
Marc-Antoine Georges
J. Schwartz
Thomas Hueber
SSL
227
5
0
17 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
776
471
0
21 May 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable
  Convolutions
End-to-End Zero-Shot Voice Conversion with Location-Variable ConvolutionsInterspeech (Interspeech), 2022
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
284
10
0
19 May 2022
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
  Stochastic Quantization
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic QuantizationInternational Conference on Machine Learning (ICML), 2022
Yuhta Takida
Takashi Shibuya
Wei-Hsiang Liao
Chieh-Hsin Lai
Junki Ohmura
Toshimitsu Uesaka
Naoki Murata
Shusuke Takahashi
Toshiyuki Kumakura
Yuki Mitsufuji
BDL
293
97
0
16 May 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Autoregressive Co-Training for Learning Discrete Speech RepresentationsInterspeech (Interspeech), 2022
Sung-Lin Yeh
Hao Tang
SSL
285
8
0
29 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and
  decoding lexical and sublexical semantic information into speech with no
  direct access to speech data
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech dataInterspeech (Interspeech), 2022
Gašper Beguš
Alan Zhou
SSL
419
6
0
22 Mar 2022
Modelling word learning and recognition using visually grounded speech
Modelling word learning and recognition using visually grounded speech
Danny Merkx
Sebastiaan Scholten
S. Frank
M. Ernestus
O. Scharenborg
SSL
288
0
0
14 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDLAI4TSSSL
265
13
0
01 Mar 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and
  Self-Supervised Scoring
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised ScoringIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Herman Kamper
334
31
0
24 Feb 2022
AVQVC: One-shot Voice Conversion by Vector Quantization with applying
  contrastive learning
AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
315
63
0
21 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
  transfer from voice conversion
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Disong Wang
Shan Yang
Jane Polak Scowcroft
Xunying Liu
Dong Yu
Helen Meng
200
11
0
18 Feb 2022
Robust Vector Quantized-Variational Autoencoder
Chieh-Hsin Lai
Dongmian Zou
Gilad Lerman
DRL
348
6
0
04 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation
  Analysis with Co-occurrence cues
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cuesIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2022
Akira Taniguchi
Hiroaki Murakami
Ryo Ozaki
T. Taniguchi
276
2
0
18 Jan 2022
Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete
  Latent Representations
Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Alex F. McKinney
Benjamin Cauchi
269
3
0
24 Nov 2021
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
Tomoki Toda
151
15
0
13 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion
A Comparison of Discrete and Soft Speech Units for Improved Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
459
162
0
03 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation
  Learning
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
334
12
0
27 Oct 2021
Interpreting intermediate convolutional layers in unsupervised acoustic
  word classification
Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Gašper Beguš
Alan Zhou
FAttSSL
271
6
0
05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation
  Learning using Segmental Contrastive Predictive Coding
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
392
28
0
05 Oct 2021
Noisy-to-Noisy Voice Conversion Framework with Denoising Model
Noisy-to-Noisy Voice Conversion Framework with Denoising ModelAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
Tomoki Toda
245
11
0
22 Sep 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
Masked Acoustic Unit for Mispronunciation Detection and CorrectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Zhan Zhang
Yuehai Wang
Jianyi Yang
293
3
0
12 Aug 2021
Analyzing Speaker Information in Self-Supervised Models to Improve
  Zero-Resource Speech Processing
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
Benjamin van Niekerk
Leanne Nortje
Matthew Baas
Herman Kamper
SSL
307
34
0
02 Aug 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and
  Emotional Style Transfer
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style TransferAutomatic Speech Recognition & Understanding (ASRU), 2021
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
314
25
0
08 Jul 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
  Speech Representation Disentanglement for One-shot Voice Conversion
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice ConversionInterspeech (Interspeech), 2021
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
219
178
0
18 Jun 2021
Unsupervised Automatic Speech Recognition: A Review
Unsupervised Automatic Speech Recognition: A ReviewSpeech Communication (Speech Commun.), 2021
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLMSSL
186
70
0
09 Jun 2021
Segmental Contrastive Predictive Coding for Unsupervised Word
  Segmentation
Segmental Contrastive Predictive Coding for Unsupervised Word SegmentationInterspeech (Interspeech), 2021
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
226
44
0
03 Jun 2021
Unsupervised Speech Recognition
Unsupervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
475
295
0
24 May 2021
12
Next
Page 1 of 2