v1v2 (latest)

Learning latent representations for style control and transfer in end-to-end speech synthesis

11 December 2018

Shifeng Pan

Papers citing "Learning latent representations for style control and transfer in end-to-end speech synthesis"

50 / 119 papers shown

Title
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 112 40 0 30 May 2022
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence Sangshin Oh Seyun Um Hong-Goo Kang BDL DRL 41 2 0 09 May 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation Ryo Terashima Ryuichi Yamamoto Eunwoo Song Yuma Shirahata Hyun-Wook Yoon Jae-Min Kim Kentaro Tachibana 52 16 0 21 Apr 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Jiankun Hu Zhiyong Wu Shiyin Kang Helen Meng 79 10 0 06 Apr 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Shiyin Kang Helen Meng 52 12 0 23 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis Pengyu Cheng Zhenhua Ling 72 3 0 02 Mar 2022
Cross-speaker style transfer for text-to-speech using data augmentation M. Ribeiro Julian Roth Giulia Comini Goeric Huybrechts Adam Gabry's Jaime Lorenzo-Trueba 66 21 0 10 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 155 18 0 24 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 79 75 0 17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 172 58 0 10 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 86 11 0 23 Dec 2021
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition Haoran Sun Lantian Li Tianshi Zheng Dong Wang CVBM 51 0 0 24 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 60 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 144 9 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 62 4 0 19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 72 18 0 19 Nov 2021
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech Mu Li Jonas Rohnke Antonio Bonafonte Mateusz Lajszczak Trevor Wood DRL 96 2 0 24 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation Fengyu Yang Jian Luan Yujun Wang 137 5 0 19 Oct 2021
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding Sergey Nikonorov Berrak Sisman Mingyang Zhang Haizhou Li 39 3 0 13 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis Cheng Gong Longbiao Wang Zhenhua Ling Ju Zhang Jianwu Dang 48 5 0 09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 62 16 0 08 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 78 23 0 06 Oct 2021
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning Rui Li dong Pu Minnie Huang Bill Huang 86 14 0 23 Sep 2021
LDC-VAE: A Latent Distribution Consistency Approach to Variational AutoEncoders Xiaoyu Chen Chen Gong Qiang He Xinwen Hou Yu Liu 77 1 0 22 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 67 47 0 14 Sep 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis Julian Zaïdi Hugo Seuté Benjamin van Niekerk M. Carbonneau 61 21 0 04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis Xudong Dai Cheng Gong Longbiao Wang Kaili Zhang 34 2 0 04 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 92 23 0 27 Jul 2021
On Prosody Modeling for ASR+TTS based Voice Conversion Wen-Chin Huang Tomoki Hayashi Xinjian Li Shinji Watanabe Tomoki Toda 73 9 0 20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 68 17 0 17 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak Jaesung Bae Hanbin Bae Young-Ik Kim Hoon-Young Cho 120 17 0 29 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Jane Polak Scowcroft 73 30 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 119 9 0 18 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion Zhichao Wang Xinyong Zhou Fengyu Yang Tao Li Hongqiang Du Lei Xie Wendong Gan Haitao Chen Hai Li 65 22 0 16 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 74 7 0 14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 167 902 0 11 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS Liping Chen Yan Deng Xi Wang Frank Soong Lei He 89 23 0 08 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis Shakti Kumar Jithin Pradeep Hussain Zaidi DRL 68 6 0 10 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 94 25 0 20 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis Yixuan Zhou Changhe Song Jingbei Li Zhiyong Wu Yanyao Bian Jane Polak Scowcroft Helen Meng 103 6 0 14 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability Rui Liu Berrak Sisman Haizhou Li 69 32 0 03 Apr 2021
Energy Disaggregation using Variational Autoencoders A. Langevin M. Carbonneau M. Cheriet G. Gagnon DRL 57 59 0 22 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee Kyumin Park Daeyoung Kim 69 32 0 17 Mar 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David Harwath Christopher Song James R. Glass CLIP 90 67 0 31 Dec 2020
Parallel WaveNet conditioned on VAE latent vectors Jonas Rohnke Thomas Merritt Jaime Lorenzo-Trueba Adam Gabry's Vatsal Aggarwal Alexis Moinet Roberto Barra-Chicote 74 3 0 17 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis Neeraj Kumar Srishti Goel Ankur Narang Brejesh Lall 68 5 0 14 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Lingwei Kong Jing Xiao 62 9 0 03 Dec 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 79 74 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 88 56 0 17 Nov 2020