FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 June 2020

Xu Tan

Zhou Zhao

Papers citing "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

50 / 754 papers shown

Title
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators Wiebke Hutiri Orestis Papakyriakopoulos Alice Xiang 28 16 0 25 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech Chenpeng Du Yiwei Guo Hankun Wang Yifan Yang Zhikang Niu Shuai Wang Hui Zhang Xie Chen Kai Yu VLM 22 25 0 25 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization Wei-Ping Huang Sung-Feng Huang Hung-yi Lee 29 0 0 23 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions Jiawei Zhang Tianyu Pang Chao Du Yi Ren Bo-wen Li Min-Bin Lin MLLM 27 14 0 22 Jan 2024
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis R. Vinotha D. Hepsiba L. D. V. Anand Deepak John Reji 13 1 0 22 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment Hyoung-Seok Oh Sang-Hoon Lee Deok-Hyun Cho Seong-Whan Lee 39 1 0 16 Jan 2024
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2 Aniket Tathe Anand Kamble Suyash Kumbharkar Atharva Bhandare Anirban C. Mitra 30 1 0 11 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters Kenichi Fujita Hiroshi Sato Takanori Ashihara Hiroki Kanagawa Marc Delcroix Takafumi Moriya Yusuke Ijima 31 8 0 10 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models Zhifeng Xie Shengye Yu Qile He Mengtian Li VLM VGen 28 2 0 09 Jan 2024
Transfer the linguistic representations from TTS to accent conversion with non-parallel data Xi Chen Jiakun Pei Liumeng Xue Mingyang Zhang 36 4 0 07 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations Yejin Jeon Yunsu Kim Gary Geunbae Lee 32 2 0 04 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction Minchan Kim Myeonghun Jeong Byoung Jin Choi Semin Kim Joun Yeop Lee Nam Soo Kim AI4TS 23 4 0 03 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts Apoorv Vyas Bowen Shi Matt Le Andros Tjandra Yi-Chiao Wu ... Chris Summers Carleigh Wood Joshua Lane Mary Williamson Wei-Ning Hsu 44 75 0 25 Dec 2023
Creating New Voices using Normalizing Flows Piotr Bilinski Thomas Merritt Abdelhamid Ezzerg Kamil Pokora Sebastian Cygert K. Yanagisawa Roberto Barra-Chicote Daniel Korzekwa 18 17 0 22 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations Cheng Gong Xin Wang Erica Cooper Dan Wells Longbiao Wang Jianwu Dang Korin Richmond Junichi Yamagishi 24 21 0 22 Dec 2023
Style Modeling for Multi-Speaker Articulation-to-Speech Miseul Kim Zhenyu Piao Jihyun Lee Hong-Goo Kang 26 8 0 21 Dec 2023
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis Xueyuan Chen Xi Wang Shaofei Zhang Lei He Zhiyong Wu Xixin Wu Helen M. Meng 41 7 0 19 Dec 2023
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling Rui Liu Yifan Hu Yi Ren Xiang Yin Haizhou Li 37 16 0 19 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Wenhao Guan Yishuang Li Tao Li Hukai Huang Feng Wang Jiayan Lin Lingyan Huang Lin Li Q. Hong 23 8 0 17 Dec 2023
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis Yu Zhang Rongjie Huang Ruiqi Li Jinzheng He Yan Xia Feiyang Chen Xinyu Duan Baoxing Huai Zhou Zhao VLM 16 17 0 17 Dec 2023
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis Yayue Deng Jinlong Xue Yukang Jia Qifei Li Yichen Han Fengping Wang Yingming Gao Dengfeng Ke Ya Li 30 7 0 16 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Xueyao Zhang Liumeng Xue Yicheng Gu Yuancheng Wang Haorui He ... Mingxuan Wang Jun Han Kai Chen Haizhou Li Zhizheng Wu 27 26 0 15 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Georgios Milis P. Filntisis A. Roussos Petros Maragos CVBM 34 2 0 11 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Zehua Chen Guande He Kaiwen Zheng Xu Tan Jun Zhu DiffM 53 21 0 06 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking Chang-rui Liu Jie Zhang Tianwei Zhang Xi Yang Weiming Zhang Neng H. Yu 25 28 0 06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation J. Choi Se Jin Park Minsu Kim Y. Ro 25 12 0 05 Dec 2023
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking Jihyun Lee Yejin Jeon Wonjun Lee Yunsu Kim Gary Geunbae Lee 15 1 0 04 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning Raviraj Joshi Nikesh Garera 25 0 0 02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints Raviraj Joshi Nikesh Garera 25 0 0 02 Dec 2023
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices Gokul Srinivasagan Michael Deisher Munir Georges VLM 19 0 0 30 Nov 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes Pavel Korshunov Haolin Chen Philip N. Garner S´ebastien Marcel CVBM 43 4 0 29 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Sang-Hoon Lee Haram Choi Seung-Bin Kim Seong-Whan Lee BDL 27 31 0 21 Nov 2023
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation Jiangzong Wang Pengcheng Li Xulong Zhang Ning Cheng Jing Xiao 24 0 0 14 Nov 2023
SponTTS: modeling and transferring spontaneous style for TTS Hanzhao Li Xinfa Zhu Liumeng Xue Yang Song Yunlin Chen Lei Xie 19 7 0 13 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Rishabh Jain Peter Corcoran 20 0 0 07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction Minchan Kim Myeonghun Jeong Byoung Jin Choi Dongjune Lee N. Kim AI4TS 25 10 0 06 Nov 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations Hanglei Zhang Yiwei Guo Sen Liu Xie Chen Kai Yu 17 0 0 02 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Yuan Gao Nobuyuki Morioka Yu Zhang Nanxin Chen DiffM 26 27 0 02 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN Neeraj Kumar Ankur Narang Brejesh Lall DiffM 21 0 0 27 Oct 2023
Enabling Acoustic Audience Feedback in Large Virtual Events Tamay Aykut M. Hofbauer Christopher B. Kuhn Eckehard Steinbach Bernd Girod 38 0 0 27 Oct 2023
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions Florian Lux Pascal Tilli Sarina Meyer Ngoc Thang Vu 15 2 0 26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023 Florian Lux Julia Koch Sarina Meyer Thomas Bott Nadja Schauffler Pavel Denisov Antje Schweitzer Ngoc Thang Vu 19 6 0 26 Oct 2023
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning Xinfa Zhu Yuke Li Yinjiao Lei Ning Jiang Guoqing Zhao Lei Xie 23 0 0 26 Oct 2023
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control Elif Bozkurt 34 0 0 25 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Marek Kubis Pawel Skórzewski Marcin Sowañski Tomasz Ziętkiewicz 11 6 0 25 Oct 2023
Generative Pre-training for Speech with Flow Matching Alexander H. Liu Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu 19 31 0 25 Oct 2023
Non-autoregressive Streaming Transformer for Simultaneous Translation Zhengrui Ma Shaolei Zhang Shoutao Guo Chenze Shao Min Zhang Yang Feng 24 12 0 23 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 39 0 0 23 Oct 2023
An overview of text-to-speech systems and media applications Mohammad Reza Hasanabadi 11 3 0 22 Oct 2023
Energy-Based Models For Speech Synthesis Wanli Sun Zehai Tu Anton Ragni DiffM 24 0 0 19 Oct 2023