FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 June 2020

Xu Tan

Zhou Zhao

Papers citing "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

50 / 754 papers shown

Title
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue Daxin Tan Nikos Kargas David McHardy C. Papayiannis A. Bonafonte Marek Střelec Jonas Rohnke A. Filandras Trevor Wood 6 0 0 07 Dec 2022
Learning the joint distribution of two sequences using little or no paired data Soroosh Mariooryad Matt Shannon Siyuan Ma Tom Bagby David Kao Daisy Stanton Eric Battenberg RJ Skerry-Ryan 17 2 0 06 Dec 2022
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing Yihan Wu Junliang Guo Xuejiao Tan Chen Zhang Bohan Li Ruihua Song Lei He Sheng Zhao Arul Menezes Jiang Bian 32 15 0 30 Nov 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech Byoung Jin Choi Myeonghun Jeong Joun Yeop Lee N. Kim 23 13 0 30 Nov 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations Nikolaos Ellinas Myrsini Christidou Alexandra Vioni June Sig Sung Aimilios Chalamandaris Pirros Tsiakoulis P. Mastorocostas 17 7 0 29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech distributions Christoph Minixhofer Ondˇrej Klejch P. Bell 33 7 0 29 Nov 2022
Efficient Incremental Text-to-Speech on GPUs Muyang Du Chuan Liu Jiaxing Qi Junjie Lai 18 1 0 25 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs Harm Lameris Shivam Mehta G. Henter Joakim Gustafson Éva Székely 38 15 0 24 Nov 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions Zhifang Guo Yichong Leng Yihan Wu Sheng Zhao Xuejiao Tan DiffM 13 90 0 22 Nov 2022
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement Chenye Cui Yi Ren Jinglin Liu Rongjie Huang Zhou Zhao VGen 30 14 0 19 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling Xinfa Zhu Yinjiao Lei Kun Song Yongmao Zhang Tao Li Linfu Xie 13 17 0 19 Nov 2022
3d human motion generation from the text via gesture action classification and the autoregressive model Gwantae Kim Youngsuk Ryu Junyeop Lee D. Han Jeongmin Bae Hanseok Ko 9 2 0 18 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Minki Kang Dong Min Sung Ju Hwang DiffM 25 48 0 17 Nov 2022
SNIPER Training: Single-Shot Sparse Training for Text-to-Speech Perry Lam Huayun Zhang Nancy F. Chen Berrak Sisman Dorien Herremans VLM 27 0 0 14 Nov 2022
BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance Haotong Qin Xudong Ma Yifu Ding X. Li Yang Zhang Zejun Ma Jiakai Wang Jie Luo Xianglong Liu MQ 40 20 0 13 Nov 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy Ya-Jie Zhang Wei Song Ya Yue Zhengchen Zhang Youzheng Wu Xiaodong He 26 7 0 11 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations Yoorim Oh Juheon Lee Yoseob Han Kyogu Lee 20 3 0 11 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Xiaoran Fan Chao Pang Tian Yuan Richard He Bai Renjie Zheng ... Junkun Chen Zeyu Chen Liang Huang Yu Sun Hua-Hong Wu 40 0 0 07 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 21 6 0 07 Nov 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer Yongmao Zhang Heyang Xue Hanzhao Li Linfu Xie Tingwei Guo Ruixiong Zhang Caixia Gong DiffM VLM 23 28 0 05 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang Songxiang Liu Jianwei Yu Helin Wang Chao Weng Yuexian Zou DiffM VLM 38 18 0 04 Nov 2022
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts Detai Xin Sharath Adavanne F. Ang Ashish Kulkarni Shinnosuke Takamichi Hiroshi Saruwatari 26 13 0 04 Nov 2022
Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation Yingjie Song Wei Song Wei Zhang Zhengchen Zhang Dan Zeng Zhi Liu Yang Yu DiffM 11 5 0 02 Nov 2022
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement Wei Song Ya Yue Ya-Jie Zhang Zhengchen Zhang Youzheng Wu Xiaodong He 21 4 0 02 Nov 2022
SIMD-size aware weight regularization for fast neural vocoding on CPU Hiroki Kanagawa Yusuke Ijima 11 0 0 02 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers Cheng-Ping Hsieh Subhankar Ghosh Boris Ginsburg 41 18 0 01 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 14 0 0 01 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Georgia Maniati Panos Kakoulidis June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 24 2 0 31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis Anusha Prakash H. Murthy 26 4 0 31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 15 0 0 28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis Yuma Shirahata Ryuichi Yamamoto Eunwoo Song Ryo Terashima Jae-Min Kim Kentaro Tachibana 23 10 0 28 Oct 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation Nobuyuki Morioka Heiga Zen Nanxin Chen Yu Zhang Yifan Ding 34 16 0 28 Oct 2022
Explicit Intensity Control for Accented Text-to-speech Rui Liu Haolin Zuo De Hu Guanglai Gao Haizhou Li 21 6 0 27 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis Yifan Hu Rui Liu Guanglai Gao Haizhou Li 75 7 0 27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations Haohan Guo Fenglong Xie Xixin Wu Hui Lu Helen Meng 62 3 0 27 Oct 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection Kentaro Seki Shinnosuke Takamichi Takaaki Saeki Hiroshi Saruwatari 23 6 0 26 Oct 2022
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network Chunhui Wang Chang Zeng Xing He 12 19 0 26 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao AI4TS 11 5 0 25 Oct 2022
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao 19 1 0 25 Oct 2022
Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao 25 1 0 25 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS Ziqi Liang 25 0 0 24 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation Chunhui Wang Chang Zeng Jun Chen Xingji He 44 7 0 23 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 30 22 0 21 Oct 2022
Robust One-Shot Singing Voice Conversion Naoya Takahashi M. Singh Yuki Mitsufuji DiffM 22 8 0 20 Oct 2022
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models Aya Watanabe Shinnosuke Takamichi Yuki Saito Detai Xin Hiroshi Saruwatari 14 3 0 18 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion Yuta Matsunaga Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 19 1 0 18 Oct 2022
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images Hien Ohnaka Shinnosuke Takamichi Keisuke Imoto Yuki Okamoto Kazuki Fujii Hiroshi Saruwatari DiffM 19 8 0 17 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling Jinchao Zhang Shuyang Jiang Jiangtao Feng Lin Zheng Lingpeng Kong 3DV 43 9 0 14 Oct 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis Yuta Matsunaga Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 12 2 0 14 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario Emily R. Bartusiak Edward J. Delp 19 12 0 14 Oct 2022