Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

30 August 2018

Yuxuan Wang

Papers citing "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"

50 / 64 papers shown

Title
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Alexander H. Liu Sang-gil Lee Chao-Han Huck Yang Yuan Gong Yu-Chun Wang James Glass Rafael Valle Bryan Catanzaro SSL 52 0 0 02 Mar 2025
A multilingual training strategy for low resource Text to Speech Asma Amalas Mounir Ghogho Mohamed Chetouani Rachid Oulad Haj Thami 41 2 0 02 Sep 2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data Takaaki Saeki Gary Wang Nobuyuki Morioka Isaac Elias Kyle Kastner ... Andrew Rosenberg Bhuvana Ramabhadran Heiga Zen Francoise Beaufays Hadar Shemtov 38 13 0 29 Feb 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization Wei-Ping Huang Sung-Feng Huang Hung-yi Lee 29 0 0 23 Jan 2024
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation Jiangzong Wang Pengcheng Li Xulong Zhang Ning Cheng Jing Xiao 24 0 0 14 Nov 2023
Generative Pre-training for Speech with Flow Matching Alexander H. Liu Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu 19 31 0 25 Oct 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Haohan Guo Fenglong Xie Jiawen Kang Yujia Xiao Xixin Wu Helen M. Meng 30 3 0 31 Aug 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation K. Lakshminarayana C. Dittmar N. Pia Emanuel Habets 23 0 0 16 Jun 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training Zhe Ye Rongjie Huang Yi Ren Ziyue Jiang Jinglin Liu Jinzheng He Xiang Yin Zhou Zhao CLIP 26 20 0 18 May 2023
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation Yu-Kuan Fu Liang-Hsuan Tseng Jiatong Shi Chen An Li Tsung-Yuan Hsu Shinji Watanabe Hung-yi Lee 17 4 0 12 May 2023
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages Seong-Hyun Park Myungseo Song Bohyung Kim Tae-Hyun Oh 22 1 0 28 Mar 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision Eugene Kharitonov Damien Vincent Zalan Borsos Raphaël Marinier Sertan Girgin Olivier Pietquin Matthew Sharifi Marco Tagliasacchi Neil Zeghidour 13 189 0 07 Feb 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study Massa Baali Tomoki Hayashi Hamdy Mubarak Soumi Maiti Shinji Watanabe W. El-Hajj Ahmed M. Ali 22 10 0 22 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang Sanyuan Chen Yu-Huan Wu Zi-Hua Zhang Long Zhou ... Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei 45 641 0 05 Jan 2023
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language Yusuke Yasuda T. Toda 25 8 0 16 Dec 2022
Learning the joint distribution of two sequences using little or no paired data Soroosh Mariooryad Matt Shannon Siyuan Ma Tom Bagby David Kao Daisy Stanton Eric Battenberg RJ Skerry-Ryan 17 2 0 06 Dec 2022
Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation Xin Yuan Robin Feng Mingming Ye 14 3 0 17 Nov 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao AI4TS 11 5 0 25 Oct 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu Pengkai Yin Rui Liu F. Bao Guanglai Gao 13 5 0 22 Sep 2022
AutoLV: Automatic Lecture Video Generator Wen Wang Yang Song Sanjay Jha VGen 16 3 0 19 Sep 2022
When Is TTS Augmentation Through a Pivot Language Useful? Nathaniel R. Robinson Perez Ogayo Swetha Gangu David R. Mortensen Shinji Watanabe 12 9 0 20 Jul 2022
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data Naoki Makishima Satoshi Suzuki Atsushi Ando Ryo Masumura 142 4 0 11 Jul 2022
Building African Voices Perez Ogayo Graham Neubig A. Black 6 14 0 01 Jul 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder Eunwoo Song Ryuichi Yamamoto Ohsung Kwon Chan Song Min-Jae Hwang Suhyeon Oh Hyun-Wook Yoon Jin-Seob Kim Jae-Min Kim 35 7 0 30 Jun 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation Ryo Terashima Ryuichi Yamamoto Eunwoo Song Yuma Shirahata Hyun-Wook Yoon Jae-Min Kim Kentaro Tachibana 11 15 0 21 Apr 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech Guangyan Zhang Kaitao Song Xu Tan Daxin Tan Yuzi Yan ... G. Wang Wei Zhou Tao Qin Tan Lee Sheng Zhao SSL 20 21 0 31 Mar 2022
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module Adam Gabry's Goeric Huybrechts M. Ribeiro C. Chien Julian Roth Giulia Comini Roberto Barra-Chicote Bartek Perz Jaime Lorenzo-Trueba 28 21 0 16 Feb 2022
Distribution augmentation for low-resource expressive text-to-speech Mateusz Lajszczak Animesh Prasad Arent van Korlaar Bajibabu Bollepalli A. Bonafonte ... M. Nicolis Alexis Moinet Thomas Drugman Trevor Wood Elena Sokolova 25 7 0 13 Feb 2022
A study on the efficacy of model pre-training in developing neural text-to-speech system Guangyan Zhang Yichong Leng Daxin Tan Ying Qin Kaitao Song Xu Tan Sheng Zhao Tan Lee 27 2 0 08 Oct 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang Jaesung Bae Taejun Bak Young-Ik Kim Hoon-Young Cho 26 36 0 29 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech Raahil Shah Kamil Pokora Abdelhamid Ezzerg V. Klimkov Goeric Huybrechts Bartosz Putrycz Daniel Korzekwa Thomas Merritt 24 25 0 24 Jun 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework Jinyin Chen Linhui Ye Zhaoyan Ming 6 6 0 10 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 21 24 0 20 Apr 2021
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset Saida Mussakhojayeva Aigerim Janaliyeva A. Mirzakhmetov Yerbolat Khassanov H. A. Varol 9 14 0 17 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 22 81 0 28 Mar 2021
Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis Mutian He Jingzhou Yang Lei He Frank Soong 15 18 0 05 Mar 2021
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input Brooke Stephenson Thomas Hueber Laurent Girin Laurent Besacier 36 10 0 19 Feb 2021
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement Hamed Hemati Damian Borth 6 9 0 12 Nov 2020
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 18 50 0 11 Nov 2020
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS Rui Liu Berrak Sisman F. Bao Guanglai Gao Haizhou Li 9 17 0 11 Aug 2020
Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages Haitong Zhang Yue Lin 6 30 0 11 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition Jin Xu Xu Tan Yi Ren Tao Qin Jian Li Sheng Zhao Tie-Yan Liu VLM 16 90 0 09 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 24 73 0 04 Aug 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation Tao Tu Yuan-Jui Chen Alexander H. Liu Hung-yi Lee 25 7 0 16 May 2020
AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN Zewang Zhang Qiao Tian Heng Lu Ling-Hao Chen Shan Liu 7 27 0 12 May 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 9 130 0 06 Feb 2020
BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization Henry B. Moss Vatsal Aggarwal N. Prateek Javier I. González Roberto Barra-Chicote BDL 6 57 0 04 Feb 2020
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends S. Latif R. Rana Sara Khalifa Raja Jurdak Junaid Qadir Björn W. Schuller AI4TS 29 81 0 02 Jan 2020
Independent language modeling architecture for end-to-end ASR Van Tung Pham Haihua Xu Yerbolat Khassanov Zhiping Zeng Chng Eng Siong Chongjia Ni B. Ma Haizhou Li AuLLM 19 15 0 25 Nov 2019