v1v2 (latest)

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

19 May 2020

Papers citing "Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge"

50 / 73 papers shown

Towards Audio Token Compression in Large Audio Language Models

387

26 Nov 2025

Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

164

20 Nov 2025

USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion

582

11 Apr 2025

Textless NLP -- Zero Resource Challenge with Low Resource Compute

220

24 Sep 2024

Discrete Unit based Masking for Improving Disentanglement in Voice ConversionSpoken Language Technology Workshop (SLT), 2024

Philip H. Lee

Ismail Rasim Ulgen

Berrak Sisman

239

17 Sep 2024

Improved Visually Prompted Keyword Localisation in Real Low-Resource SettingsInternational Conference on Speech Technology and Human-Computer Dialogue (ICSTHD), 2024

353

09 Sep 2024

Visually Grounded Speech Models have a Mutual Exclusivity Bias

331

20 Mar 2024

SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

Cheol Jun Cho

Abdelrahman Mohamed

Shang-Wen Li

Alan W. Black

Gopala K. Anumanchipalli

301

16 Oct 2023

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice AlignmentACM Multimedia (ACM MM), 2023

202

18 Sep 2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band DiffusionNeural Information Processing Systems (NeurIPS), 2023

Robin San Roman

Yossi Adi

Antoine Deleforge

Romain Serizel

Gabriel Synnaeve

Alexandre Défossez

DiffM

361

02 Aug 2023

Representation Learning With Hidden Unit Clustering For Low Resource Speech ApplicationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Varun Krishna

T. Sai

Sriram Ganapathy

SSL

191

14 Jul 2023

Rhythm Modeling for Voice ConversionIEEE Signal Processing Letters (IEEE SPL), 2023

Benjamin van Niekerk

M. Carbonneau

Herman Kamper

335

12 Jul 2023

Visually grounded few-shot word learning in low-resource settingsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

249

20 Jun 2023

Privacy in Speech Technology

Tomas Bäckström

443

09 May 2023

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS ModelsSpoken Language Technology Workshop (SLT), 2022

Yinghao Aaron Li

Cong Han

N. Mesgarani

203

29 Dec 2022

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sung-Lin Yeh

Hao Tang

SSL BDL

222

29 Oct 2022

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech ChallengeIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

272

27 Oct 2022

Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

276

23 Oct 2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022

Tzu-Quan Lin

...

322

16 Oct 2022

Towards visually prompted keyword localisation for zero-resource spoken languagesSpoken Language Technology Workshop (SLT), 2022

Leanne Nortje

Herman Kamper

191

12 Oct 2022

Non-Parallel Voice Conversion for ASR AugmentationInterspeech (Interspeech), 2022

Gary Wang

Andrew Rosenberg

Bhuvana Ramabhadran

261

15 Sep 2022

An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant ConditionsInterspeech (Interspeech), 2022

207

30 Jun 2022

A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit DiscoveryInterspeech (Interspeech), 2022

W. V. D. Merwe

Herman Kamper

J. D. Preez

217

23 Jun 2022

Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAEInterspeech (Interspeech), 2022

227

17 Jun 2022

Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Abdel-rahman Mohamed

Hung-yi Lee

Lasse Borgholt

Jakob Drachmann Havtorn

...

776

471

21 May 2022

End-to-End Zero-Shot Voice Conversion with Location-Variable ConvolutionsInterspeech (Interspeech), 2022

Wonjune Kang

M. Hasegawa-Johnson

D. Roy

284

19 May 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic QuantizationInternational Conference on Machine Learning (ICML), 2022

293

16 May 2022

Autoregressive Co-Training for Learning Discrete Speech RepresentationsInterspeech (Interspeech), 2022

Sung-Lin Yeh

Hao Tang

SSL

285

29 Mar 2022

Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech dataInterspeech (Interspeech), 2022

Gašper Beguš

Alan Zhou

SSL

419

22 Mar 2022

Modelling word learning and recognition using visually grounded speech

288

14 Mar 2022

A Brief Overview of Unsupervised Neural Speech Representation Learning

Lasse Borgholt

Jakob Drachmann Havtorn

265

01 Mar 2022

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised ScoringIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Herman Kamper

334

24 Feb 2022

AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

315

21 Feb 2022

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Shan Yang

200

18 Feb 2022

Robust Vector Quantized-Variational Autoencoder

348

04 Feb 2022

Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cuesIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2022

276

18 Jan 2022

Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations

Alex F. McKinney

Benjamin Cauchi

269

24 Nov 2021

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Chao Xie

Yi-Chiao Wu

Patrick Lumban Tobing

Wen-Chin Huang

Tomoki Toda

151

13 Nov 2021

A Comparison of Discrete and Soft Speech Units for Improved Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

459

162

03 Nov 2021

Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Shijun Wang

Dimche Kostadinov

Damian Borth

334

27 Oct 2021

Interpreting intermediate convolutional layers in unsupervised acoustic word classification

Gašper Beguš

Alan Zhou

FAtt SSL

271

05 Oct 2021

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Saurabhchand Bhati

Jesús Villalba

Piotr Żelasko

Laureano Moro-Velazquez

Najim Dehak

SSL

392

05 Oct 2021

Noisy-to-Noisy Voice Conversion Framework with Denoising ModelAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021

Chao Xie

Yi-Chiao Wu

Patrick Lumban Tobing

Wen-Chin Huang

Tomoki Toda

245

22 Sep 2021

Masked Acoustic Unit for Mispronunciation Detection and CorrectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Zhan Zhang

Yuehai Wang

Jianyi Yang

293

12 Aug 2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

307

02 Aug 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style TransferAutomatic Speech Recognition & Understanding (ASRU), 2021

Zongyang Du

Berrak Sisman

Kun Zhou

Haizhou Li

314

08 Jul 2021

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice ConversionInterspeech (Interspeech), 2021

219

178

18 Jun 2021

Unsupervised Automatic Speech Recognition: A ReviewSpeech Communication (Speech Commun.), 2021

186

09 Jun 2021

Segmental Contrastive Predictive Coding for Unsupervised Word SegmentationInterspeech (Interspeech), 2021

Saurabhchand Bhati

Jesús Villalba

Piotr Żelasko

Laureano Moro-Velazquez

Najim Dehak

SSL

226

03 Jun 2021

Unsupervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021

475

295

24 May 2021