ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.04421
  4. Cited By
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

9 May 2022
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
Yanqing Liu
Xi Wang
Yichong Leng
Yuanhao Yi
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
ArXivPDFHTML

Papers citing "NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality"

50 / 128 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
13
0
0
14 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
29
0
0
12 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
H. M. Zhang
...
Yichen Xiao
Yiying Zhou
Y. Zhang
Yuan Lu
Yucen He
21
0
0
12 May 2025
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
Jianbo Gao
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
AAML
28
0
0
28 Apr 2025
Protecting Your Voice: Temporal-aware Robust Watermarking
Protecting Your Voice: Temporal-aware Robust Watermarking
Yue Li
Weizhi Liu
Dongdong Lin
30
0
0
21 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
23
0
0
14 Apr 2025
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services
Shira Michel
Sufi Kaur
Sarah Elizabeth Gillespie
Jeffrey Gleason
Christo Wilson
A. Ghosh
22
0
0
12 Apr 2025
Watermarking for AI Content Detection: A Review on Text, Visual, and Audio Modalities
Watermarking for AI Content Detection: A Review on Text, Visual, and Audio Modalities
Lele Cao
36
0
0
02 Apr 2025
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
Leyang Wang
Joice Lin
DiffM
63
0
0
20 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
A. Hengel
Yuankai Qi
83
2
0
15 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
Yiming Li
Kaiying Yan
Shuo Shao
Tongqing Zhai
Shu-Tao Xia
Z. Qin
D. Tao
AAML
107
0
0
02 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
91
0
0
01 Mar 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
49
0
0
26 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
91
0
0
21 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
54
4
0
28 Jan 2025
MathReader : Text-to-Speech for Mathematical Documents
MathReader : Text-to-Speech for Mathematical Documents
Sieun Hyeon
Kyudan Jung
N. Kim
Hyun Gon Ryu
Jaeyoung Do
36
1
0
13 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
47
0
0
31 Dec 2024
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Ze Yuan
Yanqing Liu
Shujie Liu
Sheng Zhao
AuLLM
74
1
0
06 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from
  Text
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
64
1
0
03 Dec 2024
Deepfake Media Generation and Detection in the Generative AI Era: A
  Survey and Outlook
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
F. Khan
Mubarak Shah
82
2
0
29 Nov 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with
  Integrated Text Analysis
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
27
0
0
24 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
  Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
25
51
0
09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech
  Synthesis with Discrete Codec Modeling of EnGen-TTS
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
21
0
0
09 Oct 2024
Zero-Shot Text-to-Speech from Continuous Text Streams
Zero-Shot Text-to-Speech from Continuous Text Streams
Trung D. Q. Dang
David Aponte
Dung Tran
Tianyi Chen
K. Koishida
AuLLM
VLM
32
3
0
01 Oct 2024
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual
  and Low-Resource Text-to-Speech
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
Youngjae Kim
Yejin Jeon
Gary Geunbae Lee
21
1
0
27 Sep 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice
  Control
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
32
2
0
26 Sep 2024
FastTalker: Jointly Generating Speech and Conversational Gestures from
  Text
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
Zixin Guo
Jian Zhang
29
1
0
24 Sep 2024
Speechworthy Instruction-tuned Language Models
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
40
1
0
23 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
30
4
0
18 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style
  Temporal Modeling in Text-to-Speech
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Xuefei Liu
Guanjun Li
DiffM
28
0
0
18 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
31
0
0
13 Sep 2024
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic
  Framework and its Applicability in Automatic Pronunciation Assessment
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
Tien-Hong Lo
Meng-Ting Tsai
Berlin Chen
30
0
0
11 Sep 2024
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A
  High-Quality WaveGlow Vocoder Approach
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
Abdulhady Abas Abdullah
Sabat Salih Muhamad
Hadi Veisi
20
0
0
10 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
26
38
0
01 Sep 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
30
1
0
29 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
31
1
0
01 Aug 2024
An Unsupervised Domain Adaptation Method for Locating Manipulated Region
  in partially fake Audio
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio
Siding Zeng
Jiangyan Yi
Jianhua Tao
Yujie Chen
Shan Liang
Yong Ren
Xiaohui Zhang
30
0
0
11 Jul 2024
SaMoye: Zero-shot Singing Voice Conversion Based on Feature
  Disentanglement and Synthesis
SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis
Zihao Wang
Le Ma
Yan Liu
K. Zhang
DRL
24
0
0
10 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech
  Synthesis
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
46
0
26 Jun 2024
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Zehua Kcriss Li
Meiying Melissa Chen
Yi Zhong
Pinxin Liu
Zhiyao Duan
34
0
0
15 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
35
2
0
13 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
K. Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
28
3
0
12 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
31
6
0
10 Jun 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot
  TTS
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
Zirun Zhu
...
Yufei Xia
Jinzhu Li
Sheng Zhao
Jinyu Li
Naoyuki Kanda
40
3
0
09 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text
  to Speech Synthesizers
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
42
64
0
08 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
46
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
31
3
0
06 Jun 2024
123
Next