ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.04421
  4. Cited By
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

9 May 2022
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
Yanqing Liu
Xi Wang
Yichong Leng
Yuanhao Yi
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
ArXivPDFHTML

Papers citing "NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality"

28 / 128 papers shown
Title
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
28
170
0
07 Mar 2023
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A
  Benchmark Study
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Yiitan Yuan
Haohe Liu
Jinhua Liang
Xubo Liu
Mark D. Plumbley
Wenwu Wang
22
0
0
07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
26
7
0
06 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
22
8
0
28 Feb 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo P. Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
33
468
0
29 Jan 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
8
6
0
24 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
36
15
0
21 Jan 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
17
22
0
20 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural
  Text-to-Speech Synthesis
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
13
0
0
15 Dec 2022
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
Anni Tang
Tianyu He
Xuejiao Tan
Jun Ling
Liang Song
CVBM
18
23
0
09 Dec 2022
VideoDubber: Machine Translation with Speech-Aware Length Control for
  Video Dubbing
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
Yihan Wu
Junliang Guo
Xuejiao Tan
Chen Zhang
Bohan Li
Ruihua Song
Lei He
Sheng Zhao
Arul Menezes
Jiang Bian
16
15
0
30 Nov 2022
Evaluating and reducing the distance between synthetic and real speech
  distributions
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
17
7
0
29 Nov 2022
Ontology-aware Learning and Evaluation for Audio Tagging
Ontology-aware Learning and Evaluation for Audio Tagging
Haohe Liu
Qiuqiang Kong
Xubo Liu
Xinhao Mei
Wenwu Wang
Mark D. Plumbley
12
4
0
22 Nov 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Zhifang Guo
Yichong Leng
Yihan Wu
Sheng Zhao
Xuejiao Tan
DiffM
11
88
0
22 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable
  speech synthesis with disentangled representations
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
Yoorim Oh
Juheon Lee
Yoseob Han
Kyogu Lee
13
2
0
11 Nov 2022
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style
  Disentanglement
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement
Wei Song
Ya Yue
Ya-Jie Zhang
Zhengchen Zhang
Youzheng Wu
Xiaodong He
11
4
0
02 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data
  selection
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
19
6
0
26 Oct 2022
Leveraging Demonstrations with Latent Space Priors
Leveraging Demonstrations with Latent Space Priors
Jonas Gehring
Deepak Gopinath
Jungdam Won
Andreas Krause
Gabriel Synnaeve
Nicolas Usunier
28
4
0
26 Oct 2022
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment
  Generation via Transformer VQ-VAE
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE
Yueh-Kao Wu
Ching-Yu Chiu
Yi-Hsuan Yang
ViT
19
14
0
12 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial
  Vector-Quantized Auto-Encoders
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Yanqing Liu
Rui Xue
Lei He
Xu Tan
Sheng Zhao
16
24
0
11 Jul 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
33
38
0
30 May 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
  Binaural Audio Synthesis
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
Yichong Leng
Zehua Chen
Junliang Guo
Haohe Liu
Jiawei Chen
...
Lei He
Xiang-Yang Li
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
51
58
0
30 May 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme
  Representations for Text to Speech
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
12
21
0
31 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
14
2
0
21 Mar 2022
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
245
484
0
20 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
Soft-DTW: a Differentiable Loss Function for Time-Series
Soft-DTW: a Differentiable Loss Function for Time-Series
Marco Cuturi
Mathieu Blondel
AI4TS
127
611
0
05 Mar 2017
Previous
123