Title
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning Yangui Fang Jing Peng Xu Li Yu Xi Chengwei Zhang Guohui Zhong Kai Yu 46 0 0 06 Jun 2025
Recent Advances in Speech Language Models: A Survey Wenqian Cui Dianzhi Yu Xiaoqi Jiao Ziqiao Meng Guangyan Zhang Qichao Wang Yiwen Guo Irwin King AuLLM 208 26 0 01 Oct 2024
Progressive distillation diffusion for raw music generation Svetlana Pavlova DiffM 67 0 0 20 Jul 2023
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge Ewan Dunbar Nicolas Hamilakis Emmanuel Dupoux SSL 82 30 0 27 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings Jian Zhu Zuoyu Tian Yadong Liu Cong Zhang Chia-wen Lo SSL 82 2 0 23 Oct 2022
An Initial study on Birdsong Re-synthesis Using Neural Vocoders Rhythm Bhatia Tomi Kinnunen 49 1 0 21 Sep 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network Da-Rong Liu Po-Chun Hsu Yi-Chen Chen Sung-Feng Huang Shun-Po Chuang Da-Yi Wu Hung-yi Lee GAN 69 7 0 29 Jul 2022
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE Marc-Antoine Georges J. Schwartz Thomas Hueber SSL 114 5 0 17 Jun 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 287 368 0 21 May 2022
Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation Marc-Antoine Georges Julien Diard Laurent Girin J. Schwartz Thomas Hueber 54 7 0 05 Apr 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin Lars Maaløe Christian Igel BDL AI4TS SSL 94 11 0 01 Mar 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring Herman Kamper 107 26 0 24 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing Eugene Kharitonov Jade Copet Kushal Lakhotia Tu Nguyen Paden Tomasello ... A. Elkahky Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 124 34 0 15 Feb 2022
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics Saida Mussakhojayeva Yerbolat Khassanov H. A. Varol 81 13 0 15 Jan 2022
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding Saurabhchand Bhati Jesús Villalba Piotr Żelasko Laureano Moro-Velazquez Najim Dehak SSL 134 23 0 05 Oct 2021
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing Benjamin van Niekerk Leanne Nortje Matthew Baas Herman Kamper SSL 140 32 0 02 Aug 2021
ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition Afra Alishahia Grzegorz Chrupała Alejandrina Cristià Emmanuel Dupoux Bertrand Higy Marvin Lavechin Okko Räsänen Chen Yu 65 7 0 14 Jul 2021
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance Hieu-Thi Luong Junichi Yamagishi 85 0 0 25 Jun 2021
Unsupervised Automatic Speech Recognition: A Review Hanan Aldarmaki Asad Ullah Nazar Zaki VLM SSL 53 59 0 09 Jun 2021
Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery Thomas Glarner Janek Ebbers Reinhold Häb-Umbach DRL 16 1 0 04 May 2021
Protecting gender and identity with disentangled speech representations Dimitrios Stoidis Andrea Cavallaro 66 10 0 22 Apr 2021
Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation C. Jacobs Yevgen Matusevych Herman Kamper 66 21 0 19 Mar 2021
Generative Spoken Language Modeling from Raw Audio Kushal Lakhotia Evgeny Kharitonov Wei-Ning Hsu Yossi Adi Adam Polyak ... Tu Nguyen Jade Copet Alexei Baevski A. Mohamed Emmanuel Dupoux AuLLM 290 366 0 01 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David Harwath Christopher Song James R. Glass CLIP 90 67 0 31 Dec 2020
The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks Siyuan Feng O. Scharenborg SSL 54 3 0 17 Dec 2020
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks Herman Kamper Benjamin van Niekerk SSL MQ 91 36 0 14 Dec 2020
A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings Lisa van Staden Herman Kamper SSL 67 16 0 14 Dec 2020
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling Tu Nguyen Maureen de Seyssel Patricia Roze M. Rivière Evgeny Kharitonov Alexei Baevski Ewan Dunbar Emmanuel Dupoux SSL 151 108 0 23 Nov 2020
A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery Bolaji Yusuf Lucas Ondel L. Burget J. Černocký Murat Saraclar 54 12 0 04 Nov 2020
The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units Ewan Dunbar Julien Karadayi Mathieu Bernard Xuan-Nga Cao Robin Algayres Lucas Ondel Laurent Besacier S. Sakti Emmanuel Dupoux SSL 123 61 0 12 Oct 2020
Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication Gašper Beguš GAN SSL 49 16 0 13 Sep 2020
Exploration of End-to-end Synthesisers forZero Resource Speech Challenge 2020 Karthik Pandia D.S. Anusha Prakash M. M. H. Murthy 42 4 0 10 Sep 2020
Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders Mingjie Chen Thomas Hain SSL DRL 54 15 0 16 Aug 2020
Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics Okko Räsänen María Andrea Cruz Blandón 79 25 0 03 Aug 2020
Evaluating the reliability of acoustic speech embeddings Robin Algayres Mohamed Salah Zaiem Benoît Sagot Emmanuel Dupoux 92 29 0 27 Jul 2020
Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery Saurabhchand Bhati Jesús Villalba Piotr Żelasko Najim Dehak SSL 84 16 0 26 Jul 2020
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain Eugene Kharitonov M. Rivière Gabriel Synnaeve Lior Wolf Pierre-Emmanuel Mazaré Matthijs Douze Emmanuel Dupoux 137 118 0 02 Jul 2020
UWSpeech: Speech to Speech Translation for Unwritten Languages Chen Zhang Xu Tan Yi Ren Tao Qin Ke-jun Zhang Tie-Yan Liu 49 56 0 14 Jun 2020
CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks Gašper Beguš GAN 72 35 0 04 Jun 2020
CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning Sameer Khurana Antoine Laurent James R. Glass SSL 72 12 0 04 Jun 2020
Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge Andros Tjandra S. Sakti Satoshi Nakamura 59 39 0 24 May 2020
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge Benjamin van Niekerk Leanne Nortje Herman Kamper 120 117 0 19 May 2020
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization Jen-Yu Liu Yu-Hua Chen Yin-Cheng Yeh Yi-Hsuan Yang GAN 71 35 0 18 May 2020
Robust Training of Vector Quantized Bottleneck Models A. Lancucki J. Chorowski Guillaume Sanchez R. Marxer Nanxin Chen Hans J. G. A. Dolfing Sameer Khurana Tanel Alumäe Antoine Laurent 86 60 0 18 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem Tomoki Hayashi Shinji Watanabe 70 32 0 12 May 2020
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise Shan Yang Yuxuan Wang Lei Xie 66 10 0 28 Apr 2020
Learning Robust and Multilingual Speech Representations Kazuya Kawakami Luyu Wang Chris Dyer Phil Blunsom Aaron van den Oord SSL 97 100 0 29 Jan 2020
Unsupervised Pre-training of Bidirectional Speech Encoders via Masked Reconstruction Weiran Wang Qingming Tang Karen Livescu SSL 84 98 0 28 Jan 2020
Libri-Light: A Benchmark for ASR with Limited or No Supervision Jacob Kahn M. Rivière Weiyi Zheng Evgeny Kharitonov Qiantong Xu ... Tatiana Likhomanenko Gabriel Synnaeve Armand Joulin Abdel-rahman Mohamed Emmanuel Dupoux AuLLM 163 674 0 17 Dec 2019
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech David Harwath Wei-Ning Hsu James R. Glass 101 85 0 21 Nov 2019