ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04558
  4. Cited By
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 June 2020
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
ArXivPDFHTML

Papers citing "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

50 / 754 papers shown
Title
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis
  with Graph-based Multi-modal Context Modeling
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
H. Meng
Chao Weng
Dan Su
31
23
0
11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
René Peinl
16
0
0
11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
21
839
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
S. Hwang
11
160
0
06 Jun 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
  via Non-Autoregressive Generative Transformers
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers
Zhu Zhang
Jianxin Ma
Chang Zhou
Rui Men
Zhikang Li
Ming Ding
Jie Tang
Jingren Zhou
Hongxia Yang
25
46
0
29 May 2021
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and
  Controllable Speech Synthesis
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis
Chenpeng Du
K. Yu
9
18
0
27 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All
  You Need For Audio Generation
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
14
11
0
17 May 2021
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Yi-Chen Chen
Po-Han Chi
Shu-Wen Yang
Kai-Wei Chang
Jheng-hao Lin
Sung-Feng Huang
Da-Rong Liu
Chi-Liang Liu
Cheng-Kuang Lee
Hung-yi Lee
MoE
21
17
0
07 May 2021
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Jinglin Liu
Chengxi Li
Yi Ren
Feiyang Chen
Zhou Zhao
DiffM
44
258
0
06 May 2021
Phrase break prediction with bidirectional encoder representations in
  Japanese text-to-speech synthesis
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
Kosuke Futamata
Byeong-Cheol Park
Ryuichi Yamamoto
Kentaro Tachibana
20
14
0
26 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
13
24
0
20 Apr 2021
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Yuzi Yan
Xu Tan
Bohan Li
Tao Qin
Sheng Zhao
Yuan-Chung Shen
Tie-Yan Liu
12
44
0
20 Apr 2021
Multi-Metric Optimization using Generative Adversarial Networks for
  Near-End Speech Intelligibility Enhancement
Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement
Haoyu Li
Junichi Yamagishi
11
9
0
17 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
16
8
0
16 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice
  Conversion
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka
Kou Tanaka
Takuhiro Kaneko
31
21
0
14 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure
  for Expressive Text-to-Speech Synthesis
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Dan Su
H. Meng
22
6
0
14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion
Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi
Wen-Chin Huang
Kazuhiro Kobayashi
T. Toda
6
23
0
14 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
N. Kim
DiffM
17
189
0
03 Apr 2021
Attention Forcing for Machine Translation
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark J. F. Gales
21
7
0
02 Apr 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
M. Kang
Jihyun Lee
Simin Kim
Injung Kim
6
6
0
01 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
16
81
0
28 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech
  Decomposition for Expressive and Controllable Neural Text to Speech
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
19
30
0
17 Mar 2021
CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge
CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge
Daxin Tan
Hingpang Huang
Guangyan Zhang
Tan Lee
12
6
0
08 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker
  Representations for Multi-Speaker Multi-Style Text-to-Speech
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
14
68
0
06 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
18
186
0
01 Mar 2021
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng
Xu Tan
Sheng Zhao
Frank Soong
Xiang-Yang Li
Tao Qin
22
95
0
27 Feb 2021
Alternate Endings: Improving Prosody for Incremental Neural TTS with
  Predicted Future Text Input
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
Brooke Stephenson
Thomas Hueber
Laurent Girin
Laurent Besacier
28
10
0
19 Feb 2021
Context-Aware Prosody Correction for Text-Based Speech Editing
Context-Aware Prosody Correction for Text-Based Speech Editing
Max Morrison
Lucas Rencker
Zeyu Jin
Nicholas J. Bryan
Juan-Pablo Caceres
Bryan Pardo
24
28
0
16 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
28
22
0
12 Feb 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural
  Architecture Search
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Jinzhu Li
Sheng Zhao
Enhong Chen
Tie-Yan Liu
12
58
0
08 Feb 2021
Rich Prosody Diversity Modelling with Phone-level Mixture Density
  Network
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Chenpeng Du
K. Yu
23
17
0
01 Feb 2021
The 2020 ESPnet update: new features, broadened applications,
  performance improvements, and future plans
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
39
38
0
23 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang
Yi Ren
Xu Tan
Jinglin Liu
Ke-jun Zhang
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
24
37
0
17 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
8
5
0
14 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao
Shuang Liang
Zhencheng Liu
Minchuan Chen
Jun Ma
Shaojun Wang
Jing Xiao
14
38
0
07 Dec 2020
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform
  Generation in Multiple Domains
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Won Jang
D. Lim
Jaesam Yoon
17
31
0
19 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
19
36
0
12 Nov 2020
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker
  Adaptation and Pronunciation Enhancement
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement
Hamed Hemati
Damian Borth
6
9
0
12 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
14
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
13
97
0
06 Nov 2020
Speech Synthesis and Control Using Differentiable DSP
Speech Synthesis and Control Using Differentiable DSP
Giorgio Fabbro
Vladimir Golkov
Thomas Kemp
Daniel Cremers
11
12
0
28 Oct 2020
Parallel waveform synthesis based on generative adversarial networks
  with voicing-aware conditional discriminators
Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators
Ryuichi Yamamoto
Eunwoo Song
Min-Jae Hwang
Jae-Min Kim
19
17
0
27 Oct 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and
  Fusing Fine-Grained Voice Fragments With Attention
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention
Yist Y. Lin
C. Chien
Jheng-hao Lin
Hung-yi Lee
Lin-Shan Lee
8
78
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
34
262
0
26 Oct 2020
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality
  Speech Synthesis
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis
Min-Jae Hwang
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
18
31
0
26 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
17
102
0
22 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
13
16
0
19 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis
  Including Unsupervised Duration Modeling
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
14
112
0
08 Oct 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
14
92
0
03 Sep 2020
FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire
FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire
Jinglin Liu
Yi Ren
Zhou Zhao
Chen Zhang
Baoxing Huai
Jing Yuan
4
11
0
06 Aug 2020
Previous
123...141516
Next