Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 250 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
15
0
0
14 May 2025
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
Hyouin Liu
Zhikuan Zhang
26
0
0
12 May 2025
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Yasaman Ahmadiadli
Xiao-Ping Zhang
Naimul Khan
28
0
0
10 May 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Z. Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
51
0
0
07 Apr 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
56
4
0
28 Jan 2025
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Dongyang Dai
Zhiyong Wu
Shiyin Kang
Xixin Wu
Jia Jia
Dan Su
Dong Yu
H. Meng
25
25
0
03 Jan 2025
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
44
1
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
48
3
0
23 Sep 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
33
1
0
01 Aug 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
50
11
0
25 Jun 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
33
2
0
09 Jun 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
30
0
0
13 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
24
2
0
06 Mar 2024
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
Abhinav Garg
Jiyeon Kim
Sushil Khyalia
Chanwoo Kim
Dhananjaya N. Gowda
20
2
0
19 Jan 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
37
16
0
19 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
27
26
0
15 Dec 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
40
6
0
19 Oct 2023
Large-Scale Automatic Audiobook Creation
Brendan Walsh
Mark Hamilton
Greg Newby
Xi Wang
Serena Ruan
...
Lei He
Shaofei Zhang
Eric Dettinger
William T. Freeman
Markus Weimer
23
1
0
07 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
16
2
0
06 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
24
8
0
02 Sep 2023
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis
Linsen Song
Wayne Wu
Chaoyou Fu
Chen Change Loy
Ran He
23
10
0
31 Aug 2023
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jing Chen
Xingcheng Song
Zhendong Peng
Binbin Zhang
Fuping Pan
Zhiyong Wu
DiffM
19
16
0
31 Aug 2023
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Haibo Wang
Shiwan Zhao
Xiguang Zheng
Yong Qin
21
11
0
31 Aug 2023
CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making
Giovanni Minelli
Mirco Musolesi
33
0
0
21 Aug 2023
Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models
Heyang Xue
Shuai Guo
Pengcheng Zhu
Mengxiao Bi
DiffM
35
1
0
21 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
20
1
0
03 Aug 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
H. Meng
22
7
0
29 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Liangyu Zha
Junlin Zhou
Liyao Li
Rui Wang
Qingyi Huang
...
Xing-yan Deng
J. Xu
Haobo Wang
Gang Chen
J. Zhao
RALM
LMTD
32
42
0
17 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
17
5
0
03 Jul 2023
Visual-Aware Text-to-Speech
Mohan Zhou
Yalong Bai
Wei Zhang
Ting Yao
T. Zhao
Tao Mei
11
1
0
21 Jun 2023
Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
19
3
0
30 May 2023
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings
Yuki Saito
Shinnosuke Takamichi
Eiji Iimori
Kentaro Tachibana
Hiroshi Saruwatari
43
11
0
23 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
28
20
0
22 May 2023
EE-TTS: Emphatic Expressive TTS with Linguistic Information
Yifan Zhong
Chen Zhang
Xule Liu
Chenxi Sun
Weishan Deng
Haifeng Hu
Zhongqian Sun
13
3
0
20 May 2023
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah
Vishal Tambrahalli
Saiteja Kosgi
N. Pedanekar
Vineet Gandhi
33
0
0
19 May 2023
A unified front-end framework for English text-to-speech synthesis
Zelin Ying
Chen Li
Yuming Dong
Qiuqiang Kong
Yuping Wang
Yuanyuan Huo
Yuxuan Wang
16
4
0
18 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
18
13
0
13 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
30
40
0
11 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
19
1
0
09 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
H. Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
29
3
0
09 May 2023
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
17
1
0
19 Apr 2023
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
H. Meng
25
6
0
13 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
H. Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
18
2
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
Hyun Joon Park
Seok Woo Yang
Jin Sob Kim
Wooseok Shin
S. W. Han
17
17
0
16 Mar 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Ying Zhang
Xiaorui Wang
Zhong-ming Wang
38
9
0
14 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
26
7
0
06 Mar 2023
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffM
DRL
22
1
0
04 Mar 2023
1
2
3
4
5
Next