Title
Speech Synthesis as Augmentation for Low-Resource ASR Deblin Bagchi Shannon Wotherspoon Zhuolin Jiang P. Muthukumar 27 2 0 23 Dec 2020
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 55 16 0 23 Dec 2020
Parallel WaveNet conditioned on VAE latent vectors Jonas Rohnke Thomas Merritt Jaime Lorenzo-Trueba Adam Gabry's Vatsal Aggarwal Alexis Moinet Roberto Barra-Chicote 74 3 0 17 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis Neeraj Kumar Srishti Goel Ankur Narang Brejesh Lall 68 5 0 14 Dec 2020
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal Changhe Song Jingbei Li Yixuan Zhou Zhiyong Wu Helen Meng 49 6 0 13 Dec 2020
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis Anurag Chowdhury Arun Ross Prabu David 38 5 0 09 Dec 2020
Using previous acoustic context to improve Text-to-Speech synthesis Pilar Oplustil Gallegos Simon King 70 11 0 07 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture Chenfeng Miao Shuang Liang Zhencheng Liu Minchuan Chen Jun Ma Shaojun Wang Jing Xiao 67 38 0 07 Dec 2020
Text-to-speech for the hearing impaired Josef Schlittenlacher T. Baer 32 0 0 03 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution Zhen Zeng Jianzong Wang Ning Cheng Jing Xiao 44 8 0 03 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Lingwei Kong Jing Xiao 62 9 0 03 Dec 2020
FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge Bichen Wu Qing He Peizhao Zhang T. Koehler Kurt Keutzer Peter Vajda 47 6 0 25 Nov 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech Yiling Huang Yutian Chen Jason W. Pelecanos Quan Wang 100 12 0 24 Nov 2020
Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification Xiaoyi Qin Yaogen Yang Lin Yang Xuyang Wang Junjie Wang Ming Li 49 0 0 21 Nov 2020
Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder Sam Davis Giuseppe Coccia Sam Gooch Julian Mack 36 0 0 20 Nov 2020
DeepRepair: Style-Guided Repairing for DNNs in the Real-world Operational Environment Bing Yu Hua Qi Qing Guo Felix Juefei Xu Xiaofei Xie Lei Ma Jianjun Zhao 25 5 0 19 Nov 2020
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains Won Jang D. Lim Jaesam Yoon 60 34 0 19 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 79 74 0 17 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis Xi Wang Huaiping Ming Lei He Frank Soong 43 5 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 88 56 0 17 Nov 2020
Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher Heyang Xue Shan Yang Yinjiao Lei Lei Xie Xiulin Li 45 11 0 17 Nov 2020
Speech Prediction in Silent Videos using Variational Autoencoders Ravindra Yadav Ashish Sardana Vinay P. Namboodiri R. Hegde VGen DRL 63 23 0 14 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 91 36 0 12 Nov 2020
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement Hamed Hemati Damian Borth 72 9 0 12 Nov 2020
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model Haoyu Li Yang Ai Junichi Yamagishi 76 2 0 10 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis Erica Cooper Xin Wang Yi Zhao Yusuke Yasuda Junichi Yamagishi SyDa 50 3 0 10 Nov 2020
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation Yang Ai Haoyu Li Xin Wang Junichi Yamagishi Zhenhua Ling 47 4 0 08 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 116 21 0 08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 99 101 0 06 Nov 2020
Large-scale multilingual audio visual dubbing Yi Yang Brendan Shillingford Yannis Assael Miaosen Wang Wendi Liu ... Eren Sezener Luis C. Cobo Misha Denil Y. Aytar Nando de Freitas 70 21 0 06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis Guanghui Xu Wei Song Zhengchen Zhang Chao Zhang Xiaodong He Bowen Zhou 62 50 0 06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 66 19 0 04 Nov 2020
Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech Yeunju Choi Youngmoon Jung Youngjoo Suh Hoirin Kim 129 6 0 02 Nov 2020
FeatherTTS: Robust and Efficient attention based Neural TTS Qiao Tian Zewang Zhang Chao-Jung Liu Heng Lu Linghui Chen Bin Wei P. He Shan Liu 69 4 0 02 Nov 2020
PPG-based singing voice conversion with adversarial representation learning Zhonghao Li Benlai Tang Xiang Yin Yuan Wan Linjia Xu Chen Shen Zejun Ma 59 37 0 28 Oct 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention Yist Y. Lin C. Chien Jheng-hao Lin Hung-yi Lee Lin-Shan Lee 60 79 0 27 Oct 2020
Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020 H.C.M. Turner Giulio Lovisotto Ivan Martinovic 73 21 0 26 Oct 2020
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis Min-Jae Hwang Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 44 32 0 26 Oct 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition Xiong Cai Dongyang Dai Zhiyong Wu Xiang Li Jingbei Li Helen Meng 96 67 0 26 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 96 25 0 23 Oct 2020
Show and Speak: Directly Synthesize Spoken Description of Images Xinsheng Wang Siyuan Feng Jihua Zhu M. Hasegawa-Johnson O. Scharenborg 152 4 0 23 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines Yao Shi Hui Bu Xin Xu Shaojing Zhang Ming Li 112 223 0 22 Oct 2020
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020 Haobo Zhang Tingzhi Mao Haihua Xu Hao-Ming Huang 76 1 0 22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 76 103 0 22 Oct 2020
Learning Speaker Embedding from Text-to-Speech Jaejin Cho Piotr Żelasko Jesus Villalba Shinji Watanabe Najim Dehak 66 11 0 21 Oct 2020
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems Antoine Perquin Erica Cooper Junichi Yamagishi 27 1 0 21 Oct 2020
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training Renjie Zheng Mingbo Ma Baigong Zheng Kaibo Liu Jiahong Yuan Kenneth Church Liang Huang 76 14 0 20 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE Yusuke Yasuda Xin Wang Junichi Yamagishi 66 17 0 19 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion Shengkui Zhao Trung Hieu Nguyen Hao Wang B. Ma 60 25 0 16 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 183 1,954 0 12 Oct 2020