Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 259 papers shown
Title
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
19
18
0
30 Aug 2021
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
32
19
0
25 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues
Junjie H. Xu
Zhou Fang
Qihang Chen
Satoru Ohno
Pujana Paliyawan
14
4
0
18 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
30
6
0
16 Aug 2021
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
Hasam Khalid
Shahroz Tariq
Minha Kim
Simon S. Woo
29
183
0
11 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
44
42
0
06 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
15
22
0
27 Jul 2021
Human Perception of Audio Deepfakes
Nicolas M. Muller
Karla Markert
Konstantin Böttinger
22
49
0
20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
24
15
0
17 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures
Prateek Verma
C. Chafe
16
28
0
30 Jun 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
23
36
0
29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery
Muvazima Mansoor
Srikanth Chandar
Ramamoorthy Srinath
18
0
0
27 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
23
3
0
21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Dan Su
13
29
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
31
9
0
18 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
21
34
0
17 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
19
7
0
14 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
H. Meng
Chao Weng
Dan Su
31
24
0
11 Jun 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
23
12
0
05 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
18
24
0
20 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Yang Gao
Tyler Vuong
Mahsa Elyasi
Gaurav Bharaj
Rita Singh
18
19
0
08 Apr 2021
Phoneme-based Distribution Regularization for Speech Enhancement
Yajing Liu
Xiulian Peng
Zhiwei Xiong
Yan Lu
10
4
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
H. Meng
15
47
0
08 Apr 2021
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark J. F. Gales
21
7
0
02 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
Kun Zhou
Berrak Sisman
Haizhou Li
13
27
0
31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
19
81
0
28 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
20
186
0
01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
31
22
0
12 Feb 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
17
8
0
03 Jan 2021
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
13
9
0
03 Dec 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
25
55
0
17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
19
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
19
97
0
06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
20
19
0
04 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
14
217
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
17
102
0
22 Oct 2020
NU-GAN: High resolution neural upsampling with GAN
Rithesh Kumar
Kundan Kumar
Vicki Anand
Yoshua Bengio
Aaron Courville
16
25
0
22 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Shengkui Zhao
Trung Hieu Nguyen
Hao Wang
B. Ma
6
25
0
16 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Wei Ping
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
31
1,387
0
21 Sep 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
24
66
0
14 Sep 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon
Sang-Hoon Lee
Hyeong-Rae Noh
Seong-Whan Lee
18
11
0
16 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
16
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
27
316
0
09 Aug 2020
Xiaomingbot: A Multilingual Robot News Reporter
Runxin Xu
Jun Cao
Mingxuan Wang
Jiaze Chen
Hao Zhou
...
Xiang Yin
Xijin Zhang
Songcheng Jiang
Yuxuan Wang
Lei Li
16
11
0
12 Jul 2020
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
Luka Chkhetiani
Levan Bejanidze
17
1
0
13 Jun 2020
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
17
2
0
12 Jun 2020
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
19
20
0
10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
19
109
0
08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
45
1,354
0
08 Jun 2020
Previous
1
2
3
4
5
6
Next