Expressive TTS Training with Frame and Style Reconstruction Loss

4 August 2020

Haizhou Li

Papers citing "Expressive TTS Training with Frame and Style Reconstruction Loss"

40 / 40 papers shown

Title
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Neha Sahipjohn Ashishkumar Gudmalwar Nirmesh Shah Pankaj Wasnik R. Shah 43 5 0 13 Jun 2024
Exploring speech style spaces with language models: Emotional TTS without emotion labels Shreeram Suresh Chandra Zongyang Du Berrak Sisman 38 2 0 18 May 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling Rui Liu Yifan Hu Yi Ren Xiang Yin Haizhou Li 37 16 0 19 Dec 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency Rui Liu Jiatian Xi Ziyue Jiang Haizhou Li 9 2 0 21 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling Zhichao Wang Xinsheng Wang Qicong Xie Tao Li Linfu Xie Qiao Tian Yuping Wang 13 4 0 03 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin Tao Li Chenxu Hu Jian Cong Xinfa Zhu Jingbei Li Qiao Tian Yuping Wang Linfu Xie DiffM 24 8 0 02 Sep 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training H. Oh Sang-Hoon Lee Seong-Whan Lee DiffM 15 14 0 31 Jul 2023
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis Haobin Tang Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao DiffM 14 22 0 01 Jun 2023
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion Rui Liu Jinhua Zhang Guanglai Gao Haizhou Li 18 9 0 25 May 2023
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion Zhichao Wang Liumeng Xue Qiuqiang Kong Linfu Xie Yuan-Jui Chen Qiao Tian Yuping Wang BDL 9 3 0 12 May 2023
Accented Text-to-Speech Synthesis with Limited Data Xuehao Zhou Mingyang Zhang Yi Zhou Zhizheng Wu Haizhou Li 29 11 0 08 May 2023
Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker Navjot Kaur Paige Tuttosi 16 2 0 29 Jan 2023
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints Zhichao Wang Xinsheng Wang Linfu Xie Yuan-Jui Chen Qiao Tian Yuping Wang 22 5 0 16 Nov 2022
Explicit Intensity Control for Accented Text-to-speech Rui Liu Haolin Zuo De Hu Guanglai Gao Haizhou Li 16 6 0 27 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 15 53 0 06 Oct 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 21 6 0 22 Sep 2022
Automatic Prosody Annotation with Pre-Trained Text-Speech Model Ziqian Dai Jianwei Yu Yan Wang Nuo Chen Yanyao Bian Guangzhi Li Deng Cai Dong Yu 108 7 0 16 Jun 2022
NatiQ: An End-to-end Text-to-Speech System for Arabic Ahmed Abdelali Nadir Durrani C. Demiroğlu Fahim Dalvi Hamdy Mubarak Kareem Darwish 13 14 0 15 Jun 2022
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning Rui Liu Berrak Sisman Björn Schuller Guanglai Gao Haizhou Li 19 11 0 15 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 33 38 0 30 May 2022
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts Paige Tuttosi Emma Hughson Akihiro Matsufuji Angelica Lim 20 4 0 10 May 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 54 18 0 24 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 52 54 0 10 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 11 11 0 23 Dec 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 32 19 0 07 Oct 2021
StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 19 2 0 07 Oct 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer Zongyang Du Berrak Sisman Kun Zhou Haizhou Li 16 20 0 08 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 26 9 0 18 Jun 2021
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows Iván Vallés-Pérez Julian Roth Grzegorz Beringer Roberto Barra-Chicote J. Droppo 21 8 0 10 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD Kun Zhou Berrak Sisman Rui Liu Haizhou Li 23 167 0 31 May 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability Rui Liu Berrak Sisman Haizhou Li 21 32 0 03 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training Kun Zhou Berrak Sisman Haizhou Li 10 27 0 31 Mar 2021
Adversarially learning disentangled speech representations for robust multi-factor voice conversion Jie Wang Jingbei Li Xintao Zhao Zhiyong Wu Shiyin Kang H. Meng DRL 29 29 0 30 Jan 2021
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 11 21 0 08 Nov 2020
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech Kun Zhou Berrak Sisman Haizhou Li DRL 11 40 0 03 Nov 2020
Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech Yeunju Choi Youngmoon Jung Youngjoo Suh Hoirin Kim 6 6 0 02 Nov 2020
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset Kun Zhou Berrak Sisman Rui Liu Haizhou Li 10 185 0 28 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 18 24 0 23 Oct 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning Berrak Sisman Junichi Yamagishi Simon King Haizhou Li BDL 27 316 0 09 Aug 2020