ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 259 papers shown
Title
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
26
12
0
01 Jun 2020
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake
  Voices
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices
Run Wang
Felix Juefei Xu
Yihao Huang
Qing-Wu Guo
Xiaofei Xie
L. Ma
Yang Liu
AAML
14
104
0
28 May 2020
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks,
  and Cross-corpus Setting for Speech Emotion Recognition
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Björn W. Schuller
33
28
0
18 May 2020
Many-to-Many Voice Transformer Network
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
T. Toda
ViT
22
30
0
18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
14
61
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
13
119
0
12 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
16
32
0
12 May 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
25
56
0
04 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech
GraphTTS: graph-to-sequence modelling in neural text-to-speech
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Jing Xiao
14
21
0
04 Mar 2020
End-to-End Neural Diarization: Reformulating Speaker Diarization as
  Simple Multi-label Classification
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification
Yusuke Fujita
Shinji Watanabe
Shota Horiguchi
Yawen Xue
Kenji Nagamatsu
12
49
0
24 Feb 2020
Speech-to-Singing Conversion in an Encoder-Decoder Framework
Speech-to-Singing Conversion in an Encoder-Decoder Framework
Jayneel Parekh
Preeti Rao
Yi-Hsuan Yang
12
11
0
16 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
14
92
0
06 Feb 2020
Vocoder-free End-to-End Voice Conversion with Transformer Network
Vocoder-free End-to-End Voice Conversion with Transformer Network
June-Woo Kim
H. Jung
Minho Lee
15
4
0
05 Feb 2020
On the Resilience of Biometric Authentication Systems against Random
  Inputs
On the Resilience of Biometric Authentication Systems against Random Inputs
Benjamin Zi Hao Zhao
H. Asghar
M. Kâafar
AAML
31
23
0
13 Jan 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
8
83
0
19 Dec 2019
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven
  Acoustic Embedding Selection
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection
Shubhi Tyagi
M. Nicolis
Jonas Rohnke
Thomas Drugman
Jaime Lorenzo-Trueba
26
32
0
02 Dec 2019
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Tae-Ho Kim
Sungjae Cho
Shinkook Choi
Sejik Park
Soo-Young Lee
14
37
0
11 Nov 2019
A unified sequence-to-sequence front-end model for Mandarin
  text-to-speech synthesis
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Junjie Pan
Xiang Yin
Zhiling Zhang
Shichao Liu
Yang Zhang
Zejun Ma
Yuxuan Wang
9
26
0
11 Nov 2019
Towards Unsupervised Speech Recognition and Synthesis with Quantized
  Speech Representation Learning
Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
Alexander H. Liu
Tao Tu
Hung-yi Lee
Lin-Shan Lee
SSL
19
50
0
28 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle
  Consistency
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
M. Whitehill
Shuang Ma
Daniel J. McDuff
Yale Song
18
35
0
25 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
  End-to-End Text-to-Speech Toolkit
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
T. Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
16
201
0
24 Oct 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech
  Synthesis
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Eric Battenberg
RJ Skerry-Ryan
Soroosh Mariooryad
Daisy Stanton
David Kao
Matt Shannon
Tom Bagby
14
113
0
23 Oct 2019
The Theory behind Controllable Expressive Speech Synthesis: a
  Cross-disciplinary Approach
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach
Noé Tits
Kevin El Haddad
Thierry Dutoit
11
8
0
14 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training
Attention Forcing for Sequence-to-sequence Model Training
Qingyun Dou
Yiting Lu
Joshua Efiong
Mark J. F. Gales
19
6
0
26 Sep 2019
High Fidelity Speech Synthesis with Adversarial Networks
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
223
239
0
25 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
H. Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
23
716
0
13 Sep 2019
Initial investigation of an encoder-decoder end-to-end TTS framework
  using marginalization of monotonic hard latent alignments
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
17
8
0
30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
20
22
0
19 Aug 2019
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken
  Utterances Extracted from the Bible
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
Marcely Zanon Boito
William N. Havard
Mahault Garnerin
Éric Le Ferrand
Laurent Besacier
22
46
0
30 Jul 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic
  Speech -- a Deep Learning approach
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
12
10
0
05 Jul 2019
Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled
  Linguistic and Speaker Representations
Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations
Jing-Xuan Zhang
Zhenhua Ling
Lirong Dai
22
99
0
25 Jun 2019
Towards Transfer Learning for End-to-End Speech Synthesis from Deep
  Pre-Trained Language Models
Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models
Wei Fang
Yu-An Chung
James R. Glass
11
27
0
17 Jun 2019
Using generative modelling to produce varied intonation for speech
  synthesis
Using generative modelling to produce varied intonation for speech synthesis
Zack Hodari
O. Watts
Simon King
21
29
0
10 Jun 2019
KERMIT: Generative Insertion-Based Modeling for Sequences
KERMIT: Generative Insertion-Based Modeling for Sequences
William Chan
Nikita Kitaev
Kelvin Guu
Mitchell Stern
Jakob Uszkoreit
VLM
23
65
0
04 Jun 2019
Listening while Speaking and Visualizing: Improving ASR through
  Multimodal Chain
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
19
3
0
03 Jun 2019
Non-Autoregressive Neural Text-to-Speech
Non-Autoregressive Neural Text-to-Speech
Kainan Peng
Wei Ping
Z. Song
Kexin Zhao
27
39
0
21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven
  Dynamic Hierarchical Conditional Variational Network
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
13
75
0
17 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Yi Ren
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
38
101
0
13 May 2019
Adversarially Trained Autoencoders for Parallel-Data-Free Voice
  Conversion
Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion
Orhan Ocal
Oguz H. Elibol
Gokce Keskin
Cory Stephenson
Anil Thomas
K. Ramchandran
16
10
0
09 May 2019
Deep Learning for Audio Signal Processing
Deep Learning for Audio Signal Processing
Hendrik Purwins
Bo-wen Li
Tuomas Virtanen
Jan Schlüter
Shuo-yiin Chang
Tara N. Sainath
VLM
24
584
0
30 Apr 2019
Audio-Linguistic Embeddings for Spoken Sentences
Audio-Linguistic Embeddings for Spoken Sentences
Albert Haque
Michelle Guo
Prateek Verma
Li Fei-Fei
20
51
0
20 Feb 2019
Efficient Convolutional Neural Network Training with Direct Feedback
  Alignment
Efficient Convolutional Neural Network Training with Direct Feedback Alignment
Donghyeon Han
H. Yoo
3DV
16
17
0
06 Jan 2019
Learning pronunciation from a foreign language in speech synthesis
  networks
Learning pronunciation from a foreign language in speech synthesis networks
Younggun Lee
Suwon Shon
Taesu Kim
20
26
0
23 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
20
159
0
05 Nov 2018
Speaking style adaptation in Text-To-Speech synthesis using
  Sequence-to-sequence models with attention
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention
Bajibabu Bollepalli
Lauri Juvela
P. Alku
13
4
0
29 Oct 2018
Investigation of enhanced Tacotron text-to-speech synthesis systems with
  self-attention for pitch accent language
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language
Yusuke Yasuda
Xin Wang
Shinji Takaki
Junichi Yamagishi
14
86
0
29 Oct 2018
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Jing-Xuan Zhang
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Lirong Dai
11
129
0
16 Oct 2018
Semi-Supervised Training for Improving Data Efficiency in End-to-End
  Speech Synthesis
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An Chung
Yuxuan Wang
Wei-Ning Hsu
Yu Zhang
RJ Skerry-Ryan
17
117
0
30 Aug 2018
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN
  over Phoneme Posteriorgram Sequences
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences
Cheng-chieh Yeh
Po-Chun Hsu
Ju-Chieh Chou
Hung-yi Lee
Lin-Shan Lee
25
23
0
09 Aug 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech
  Synthesis
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan
13
122
0
04 Aug 2018
Previous
123456
Next