CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech

30 April 2020

Daniel Sáez-Trigueros

Thomas Drugman

ArXiv (abs)PDF HTML

Papers citing "CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech"

25 / 25 papers shown

Title
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Mateusz Lajszczak Guillermo Cámbara Yang Li Fatih Beyhan Arent van Korlaar ... Bartosz Putrycz Soledad López Gambino Kayeon Yoo Elena Sokolova Thomas Drugman LM&MA 113 88 0 12 Feb 2024
Creating New Voices using Normalizing Flows Piotr Bilinski Thomas Merritt Abdelhamid Ezzerg Kamil Pokora Sebastian Cygert K. Yanagisawa Roberto Barra-Chicote Daniel Korzekwa 62 16 0 22 Dec 2023
Do Prosody Transfer Models Transfer Prosody? A. Sigurgeirsson Simon King DiffM 65 8 0 07 Mar 2023
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 67 5 0 29 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Weinan Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 108 27 0 29 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 80 15 0 27 Jun 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion Xintao Zhao Feng Liu Changhe Song Zhiyong Wu Shiyin Kang Deyi Tuo Helen Meng 85 21 0 24 Mar 2022
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module Adam Gabry's Goeric Huybrechts M. Ribeiro C. Chien Julian Roth Giulia Comini Roberto Barra-Chicote Bartek Perz Jaime Lorenzo-Trueba 80 21 0 16 Feb 2022
Distribution augmentation for low-resource expressive text-to-speech Mateusz Lajszczak Animesh Prasad Arent van Korlaar Bajibabu Bollepalli Antonio Bonafonte ... M. Nicolis Alexis Moinet Thomas Drugman Trevor Wood Elena Sokolova 61 7 0 13 Feb 2022
Cross-speaker style transfer for text-to-speech using data augmentation M. Ribeiro Julian Roth Giulia Comini Goeric Huybrechts Adam Gabry's Jaime Lorenzo-Trueba 68 21 0 10 Feb 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 86 11 0 23 Dec 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 64 4 0 19 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 63 17 0 07 Nov 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 69 47 0 14 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis Songxiang Liu Shan Yang Jane Polak Scowcroft Dong Yu AI4TS 62 10 0 08 Sep 2021
Applying the Information Bottleneck Principle to Prosodic Representation Learning Guangyan Zhang Ying Qin Daxin Tan Tan Lee 77 4 0 05 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis Julian Zaïdi Hugo Seuté Benjamin van Niekerk M. Carbonneau 61 21 0 04 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 92 23 0 27 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech Raahil Shah Kamil Pokora Abdelhamid Ezzerg V. Klimkov Goeric Huybrechts Bartosz Putrycz Daniel Korzekwa Thomas Merritt 64 26 0 24 Jun 2021
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis Chenpeng Du K. Yu 154 20 0 27 May 2021
EmoCat: Language-agnostic Emotional Voice Conversion Bastian Schnell Goeric Huybrechts Bartek Perz Thomas Drugman Jaime Lorenzo-Trueba 89 11 0 14 Jan 2021
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 68 53 0 11 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 116 21 0 08 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 66 19 0 04 Nov 2020