CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

17 May 2019

Papers citing "CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network"

43 / 43 papers shown

Title
PRESENT: Zero-Shot Text-to-Prosody Control Perry Lam Huayun Zhang Nancy F. Chen Berrak Sisman Dorien Herremans 43 0 0 13 Aug 2024
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer Himanshu Maurya A. Sigurgeirsson 25 0 0 06 Jun 2024
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 39 0 0 23 Oct 2023
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion Yimin Deng Huaizhen Tang Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao 13 7 0 21 Aug 2023
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Guangyan Zhang Thomas Merritt M. Ribeiro Biel Tura Vecino K. Yanagisawa ... Ammar Abbas Piotr Bilinski Roberto Barra-Chicote Daniel Korzekwa Jaime Lorenzo-Trueba DiffM 31 3 0 31 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature Review J. Barnett 16 25 0 07 Jul 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis Chunyu Qiang Peng Yang Hao Che Ying Zhang Xiaorui Wang Zhong-ming Wang 38 9 0 14 Mar 2023
Emotion Selectable End-to-End Text-based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen Chu Yuan Zhang 30 2 0 20 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis Chunyu Qiang Peng Yang Hao Che Xiaorui Wang Zhongyuan Wang BDL 21 6 0 13 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations Nikolaos Ellinas Myrsini Christidou Alexandra Vioni June Sig Sung Aimilios Chalamandaris Pirros Tsiakoulis P. Mastorocostas 17 7 0 29 Nov 2022
Speech Synthesis with Mixed Emotions Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 14 43 0 11 Aug 2022
Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning Trevor A. McInroe Lukas Schafer Stefano V. Albrecht 20 4 0 22 Jun 2022
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity Sungjae Kim Y.E. Kim Jewoo Jun Injung Kim 29 13 0 02 Mar 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 17 72 0 17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 52 54 0 10 Jan 2022
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 8 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 22 9 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 14 4 0 19 Nov 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 32 22 0 06 Oct 2021
Applying the Information Bottleneck Principle to Prosodic Representation Learning Guangyan Zhang Ying Qin Daxin Tan Tan Lee 22 4 0 05 Aug 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 24 15 0 17 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 16 0 0 29 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 23 3 0 21 Jun 2021
Global Rhythm Style Transfer Without Text Transcriptions Kaizhi Qian Yang Zhang Shiyu Chang Jinjun Xiong Chuang Gan David D. Cox M. Hasegawa-Johnson 26 20 0 16 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 20 16 0 15 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 18 24 0 20 Apr 2021
Clockwork Variational Autoencoders Vaibhav Saxena Jimmy Ba Danijar Hafner VGen DRL 21 49 0 18 Feb 2021
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 19 36 0 12 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 19 21 0 08 Nov 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling Jonathan Shen Ye Jia Mike Chrzanowski Yu Zhang Isaac Elias Heiga Zen Yonghui Wu 14 112 0 08 Oct 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features T. Raitio Ramya Rasipuram D. Castellani 24 66 0 14 Sep 2020
RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications Adriana Stan 21 5 0 11 Sep 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 22 73 0 04 Aug 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 6 5 0 21 May 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation Tao Tu Yuan-Jui Chen Alexander H. Liu Hung-yi Lee 17 7 0 16 May 2020
Unsupervised Speech Decomposition via Triple Information Bottleneck Kaizhi Qian Yang Zhang Shiyu Chang David D. Cox M. Hasegawa-Johnson 4 177 0 23 Apr 2020
Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0 Zack Hodari Catherine Lai Simon King 6 13 0 14 Mar 2020
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features Siddharth Gururani Kilol Gupta D. Shah Z. Shakeri Jervis Pinto 4 15 0 21 Nov 2019
Improving Universal Sound Separation Using Sound Classification Efthymios Tzinis Scott Wisdom J. Hershey A. Jansen D. Ellis VLM 21 73 0 18 Nov 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 9 48 0 03 Oct 2019
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs R. Clark Hanna Silén Tom Kenter Ralph Leith ELM 10 44 0 09 Sep 2019
Using generative modelling to produce varied intonation for speech synthesis Zack Hodari O. Watts Simon King 21 29 0 10 Jun 2019