FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 June 2020

Xu Tan

Zhou Zhao

Papers citing "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

50 / 754 papers shown

Title
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy Sarina Meyer Pascal Tilli Pavel Denisov Florian Lux Julia Koch Ngoc Thang Vu 23 31 0 13 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages Thibault Sellam Ankur Bapna Joshua Camp Diana Mackinnon Ankur P. Parikh Jason Riesa 30 17 0 12 Oct 2022
PARAGEN : A Parallel Generation Toolkit Jiangtao Feng Yi Zhou Jun Zhang Xian Qian Liwei Wu Zhexi Zhang Yanming Liu Mingxuan Wang Lei Li Hao Zhou VLM 30 3 0 07 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 20 53 0 06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data Ye Zhu Yuehua Wu N. Sebe Yan Yan 33 16 0 05 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration Yuma Koizumi Kohei Yatabe Heiga Zen M. Bacchiani DiffM 42 29 0 03 Oct 2022
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition Jash Rathod Nauman Dawalatabad Shatrughan Singh Dhananjaya N. Gowda 17 9 0 01 Oct 2022
AudioGen: Textually Guided Audio Generation Felix Kreuk Gabriel Synnaeve Adam Polyak Uriel Singer Alexandre Défossez Jade Copet Devi Parikh Yaniv Taigman Yossi Adi DiffM 27 289 0 30 Sep 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech Yusuke Nakai Yuki Saito K. Udagawa Hiroshi Saruwatari AAML 17 1 0 26 Sep 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu Pengkai Yin Rui Liu F. Bao Guanglai Gao 18 5 0 22 Sep 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 29 6 0 22 Sep 2022
Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN Yin-Ping Cho Yu Tsao Hsin-Min Wang Yi-Wen Liu DiffM 35 8 0 21 Sep 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings M. Rahman M. Graciarena Diego Castán Chris Cobo-Kroenke Mitchell McLaren A. Lawson AAML 25 9 0 15 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 19 23 0 14 Sep 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks L. Finkelstein Heiga Zen Norman Casagrande Chun-an Chan Ye Jia ... Jonathan Shen V. Wan Yu Zhang Yonghui Wu R. Clark 17 9 0 28 Aug 2022
TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training Huaizhen Tang Xulong Zhang Jianzong Wang Ning Cheng Zhen Zeng Edward Xiao Jing Xiao 16 20 0 08 Aug 2022
Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features Jun Xue Cunhang Fan Zhao Lv J. Tao Jiangyan Yi C. Zheng Zhengqi Wen Minmin Yuan S. Shao 28 31 0 02 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 27 5 0 29 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 44 194 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 28 10 0 13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 17 14 0 13 Jul 2022
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System Yi-Chiao Wu Patrick Lumban Tobing Kazuki Yasuhara Noriyuki Matsunaga Yamato Ohtani T. Toda 39 0 0 13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection Haoxin Ma Jiangyan Yi Chenglong Wang Xin Yan J. Tao Tao Wang Shiming Wang Ruibo Fu 16 26 0 12 Jul 2022
PoeticTTS -- Controllable Poetry Reading for Literary Studies Julia Koch Florian Lux Nadja Schauffler T. Bernhart Felix Dieterle Jonas Kuhn Sandra Richter Gabriel Viehhauser Ngoc Thang Vu 22 5 0 11 Jul 2022
Speaker Anonymization with Phonetic Intermediate Representations Sarina Meyer Florian Lux Pavel Denisov Julia Koch Pascal Tilli Ngoc Thang Vu 21 27 0 11 Jul 2022
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data Naoki Makishima Satoshi Suzuki Atsushi Ando Ryo Masumura 142 4 0 11 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu Rui Xue Lei He Xu Tan Sheng Zhao 23 24 0 11 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqiang Wang Zhou Zhao 19 10 0 08 Jul 2022
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training Zewang Zhang Yibin Zheng Xinhui Li Li Lu DiffM 23 11 0 05 Jul 2022
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model Brooke Stephenson Laurent Besacier Laurent Girin Thomas Hueber 12 8 0 04 Jul 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech Keon Lee Kyumin Park Daeyoung Kim LM&MA 16 42 0 03 Jul 2022
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems Hyun-Wook Yoon Ohsung Kwon Hoyeon Lee Ryuichi Yamamoto Eunwoo Song Jae-Min Kim Min-Jae Hwang 29 15 0 30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder Eunwoo Song Ryuichi Yamamoto Ohsung Kwon Chan Song Min-Jae Hwang Suhyeon Oh Hyun-Wook Yoon Jin-Seob Kim Jae-Min Kim 35 7 0 30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Wenbo Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 50 26 0 29 Jun 2022
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody Peter Makarov Ammar Abbas Mateusz Lajszczak Arnaud Joly S. Karlapati Alexis Moinet Thomas Drugman Penny Karanasou 8 16 0 29 Jun 2022
Expressive, Variable, and Controllable Duration Modelling in TTS Ammar Abbas Thomas Merritt Alexis Moinet S. Karlapati Ewa Muszyñska Simon Slangen Elia Gatti Thomas Drugman 28 10 0 28 Jun 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder Taejun Bak Junmo Lee Hanbin Bae Jinhyeok Yang Jaesung Bae Young-Sun Joo 23 27 0 27 Jun 2022
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding Wei-Ping Huang Po-Chun Chen Sung-Feng Huang Hung-yi Lee 19 1 0 27 Jun 2022
Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations Chin-Cheng Hsu 11 7 0 25 Jun 2022
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech Florian Lux Julia Koch Ngoc Thang Vu 32 19 0 24 Jun 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis Tae-Woo Kim Minguk Kang Gyeong-Hoon Lee AAML 19 6 0 23 Jun 2022
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS K. Udagawa Yuki Saito Hiroshi Saruwatari 6 5 0 21 Jun 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History Yuto Nishimura Yuki Saito Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari AI4TS 17 7 0 16 Jun 2022
Multimodal Learning with Transformers: A Survey P. Xu Xiatian Zhu David A. Clifton ViT 54 527 0 13 Jun 2022
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos Alexander Waibel M. Behr Fevziye Irem Eyiokur Dogucan Yaman Tuan-Nam Nguyen Carlos Mullov Mehmet Arif Demirtas Alperen Kantarci Stefan Constantin H. K. Ekenel CVBM 15 14 0 09 Jun 2022
FlexLip: A Controllable Text-to-Lip System Dan Oneaţă Beáta Lőrincz Adriana Stan H. Cucu 23 3 0 07 Jun 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech Ziyue Jiang Zhe Su Zhou Zhao Qian Yang Yi Ren Jinglin Liu Zhe Ye 24 4 0 05 Jun 2022
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations Chang Liu Zhenhua Ling Linghui Chen 23 3 0 02 Jun 2022
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation Kun Song Heyang Xue Xinsheng Wang Jian Cong Yongmao Zhang Linfu Xie Bing Yang Xiong Zhang Dan Su 13 5 0 01 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 33 38 0 30 May 2022