Learning latent representations for style control and transfer in
end-to-end speech synthesis

v1v2 (latest)

Learning latent representations for style control and transfer in end-to-end speech synthesis

11 December 2018

Shifeng Pan

ArXiv (abs)PDF HTML

Papers citing "Learning latent representations for style control and transfer in end-to-end speech synthesis"

19 / 119 papers shown

Title
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 91 36 0 12 Nov 2020
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 68 53 0 11 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 116 21 0 08 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis Guanghui Xu Wei Song Zhengchen Zhang Chao Zhang Xiaodong He Bowen Zhou 62 50 0 06 Nov 2020
Paralinguistic Privacy Protection at the Edge Ranya Aloufi Hamed Haddadi David E. Boyle 64 14 0 04 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 66 19 0 04 Nov 2020
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis Yukiya Hono Kazuna Tsuboi Kei Sawada Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku K. Tokuda BDL 57 24 0 17 Sep 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis Fengyu Yang Shan Yang Qinghua Wu Yujun Wang Lei Xie 73 5 0 03 Aug 2020
Privacy-preserving Voice Analysis via Disentangled Representations Ranya Aloufi Hamed Haddadi David E. Boyle DRL 130 58 0 29 Jul 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 41 5 0 21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 91 67 0 18 May 2020
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech S. Karlapati Alexis Moinet Arnaud Joly V. Klimkov Daniel Sáez-Trigueros Thomas Drugman 46 67 0 30 Apr 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Jing Xiao 52 21 0 04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 56 130 0 06 Feb 2020
Emotional speech synthesis with rich and granularized control Seyun Um Sangshin Oh Kyungguen Byun Inseon Jang C. Ahn Hong-Goo Kang 85 90 0 05 Nov 2019
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech Daniel Korzekwa Roberto Barra-Chicote B. Kostek Thomas Drugman Mateusz Lajszczak 29 20 0 10 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural text-to-speech V. Klimkov S. Ronanki Jonas Rohnke Thomas Drugman AI4TS 89 82 0 04 Jul 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis Eric Battenberg Soroosh Mariooryad Daisy Stanton RJ Skerry-Ryan Matt Shannon David Kao Tom Bagby BDL 107 45 0 08 Jun 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis Yanyao Bian Changbin Chen Yongguo Kang Zhenglin Pan 77 46 0 04 Apr 2019