Robust and fine-grained prosody control of end-to-end speech synthesis

6 November 2018

Papers citing "Robust and fine-grained prosody control of end-to-end speech synthesis"

35 / 35 papers shown

Title
Cross-Utterance Conditioned VAE for Speech Generation Yong Li Cheng Yu Guangzhi Sun Weiqin Zu Zheng Tian ... Wei Pan Chao Zhang Jun Wang Yang Yang Fanglei Sun 29 2 0 08 Sep 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Xixin Wu Shiyin Kang Helen Meng 40 7 0 29 Jul 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents Suhee Jo Younggun Lee Yookyung Shin Yeongtae Hwang Taesu Kim 15 3 0 15 Mar 2023
Do Prosody Transfer Models Transfer Prosody? A. Sigurgeirsson Simon King DiffM 17 7 0 07 Mar 2023
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities Shijun Wang Jón Guðnason Damian Borth 42 8 0 02 Mar 2023
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 19 0 0 01 Nov 2022
Speech Synthesis with Mixed Emotions Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 27 45 0 11 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset Xiang Li Changhe Song X. Wei Zhiyong Wu Jia Jia Helen Meng 29 4 0 10 Aug 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 33 15 0 27 Jun 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022 Jiameng Gao 35 0 0 08 Apr 2022
Unsupervised word-level prosody tagging for controllable speech synthesis Yiwei Guo Chenpeng Du Kai Yu 26 15 0 15 Feb 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 32 73 0 17 Jan 2022
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 19 11 0 19 Nov 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 88 30 0 12 Oct 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 25 22 0 27 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 23 353 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 27 16 0 15 Jun 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis Xiang Li Changhe Song Jingbei Li Zhiyong Wu Jia Jia Helen Meng 25 47 0 08 Apr 2021
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure Jui Shah Yaman Kumar Singla Changyou Chen R. Shah 38 81 0 02 Jan 2021
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Lingwei Kong Jing Xiao 16 9 0 03 Dec 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 27 55 0 17 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 32 36 0 12 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 34 21 0 08 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 28 19 0 04 Nov 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features T. Raitio Ramya Rasipuram D. Castellani 42 66 0 14 Sep 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 42 73 0 04 Aug 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 16 5 0 21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 37 66 0 18 May 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Jing Xiao 24 21 0 04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 18 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 36 92 0 06 Feb 2020
Teacher-Student Training for Robust Tacotron-based TTS Rui Liu Berrak Sisman Jingdong Li F. Bao Guanglai Gao Haizhou Li 19 38 0 07 Nov 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis Eric Battenberg RJ Skerry-Ryan Soroosh Mariooryad Daisy Stanton David Kao Matt Shannon Tom Bagby 33 113 0 23 Oct 2019
Direct speech-to-speech translation with a sequence-to-sequence model Ye Jia Ron J. Weiss Fadi Biadsy Wolfgang Macherey Melvin Johnson Zhehuai Chen Yonghui Wu 23 223 0 12 Apr 2019