Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

2 December 2019

Jonas Rohnke

Papers citing "Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection"

20 / 20 papers shown

Title
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud Hyouin Liu Zhikuan Zhang 24 0 0 12 May 2025
Audio-Visual Neural Syntax Acquisition Cheng-I Jeff Lai Freda Shi Puyuan Peng Yoon Kim Kevin Gimpel ... David D. Cox David F. Harwath Yang Zhang Karen Livescu James R. Glass CLIP NAI 45 1 0 11 Oct 2023
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder Xuyuan Li Zengqiang Shang Peiyang Shi Hua Hua Jian Liu Pengyuan Zhang 21 0 0 25 Aug 2023
Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences Yuan Tseng Cheng-I Jeff Lai Hung-yi Lee SSL 35 4 0 15 Mar 2023
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language Yusuke Yasuda T. Toda 25 8 0 16 Dec 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 19 23 0 14 Sep 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 8 4 0 03 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 19 5 0 29 Jul 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 14 15 0 27 Jun 2022
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech Mu-Wei Li Jonas Rohnke A. Bonafonte Mateusz Lajszczak Trevor Wood DRL 17 2 0 24 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis Cheng Gong Longbiao Wang Zhenhua Ling Ju Zhang J. Dang 9 5 0 09 Oct 2021
Location, Location: Enhancing the Evaluation of Text-to-Speech Synthesis Using the Rapid Prosody Transcription Paradigm Elijah Gutierrez Pilar Oplustil Gallegos Catherine Lai 13 3 0 06 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 16 0 0 29 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 12 7 0 14 Jun 2021
EmoCat: Language-agnostic Emotional Voice Conversion Bastian Schnell Goeric Huybrechts Bartek Perz Thomas Drugman Jaime Lorenzo-Trueba 11 10 0 14 Jan 2021
Using previous acoustic context to improve Text-to-Speech synthesis Pilar Oplustil Gallegos Simon King 13 11 0 07 Dec 2020
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 13 50 0 11 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 15 19 0 04 Nov 2020
Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0 Zack Hodari Catherine Lai Simon King 6 13 0 14 Mar 2020
Probing the phonetic and phonological knowledge of tones in Mandarin TTS models Jian Zhu 11 8 0 23 Dec 2019