ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 259 papers shown
Title
Neural HMMs are all you need (for high-quality attention-free TTS)
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
19
18
0
30 Aug 2021
Integrated Speech and Gesture Synthesis
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
32
19
0
25 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing
  Highlight Cues
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues
Junjie H. Xu
Zhou Fang
Qihang Chen
Satoru Ohno
Pujana Paliyawan
14
4
0
18 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
30
6
0
16 Aug 2021
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
Hasam Khalid
Shahroz Tariq
Minha Kim
Simon S. Woo
29
183
0
11 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training
  withTime-Frequency Domain Features
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
44
42
0
06 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech
  Synthesis
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
15
22
0
27 Jul 2021
Human Perception of Audio Deepfakes
Human Perception of Audio Deepfakes
Nicolas M. Muller
Karla Markert
Konstantin Böttinger
22
49
0
20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio
Learning De-identified Representations of Prosody from Raw Audio
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
24
15
0
17 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures
A Generative Model for Raw Audio Using Transformer Architectures
Prateek Verma
C. Chafe
16
28
0
30 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
23
36
0
29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery
AI based Presentation Creator With Customized Audio Content Delivery
Muvazima Mansoor
Srikanth Chandar
Ramamoorthy Srinath
18
0
0
27 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style
  Control
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
23
3
0
21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Dan Su
13
29
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
31
9
0
18 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
  Text-to-Speech Model
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
21
34
0
17 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
19
7
0
14 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis
  with Graph-based Multi-modal Context Modeling
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
H. Meng
Chao Weng
Dan Su
31
24
0
11 Jun 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
23
12
0
05 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
18
24
0
20 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Yang Gao
Tyler Vuong
Mahsa Elyasi
Gaurav Bharaj
Rita Singh
18
19
0
08 Apr 2021
Phoneme-based Distribution Regularization for Speech Enhancement
Phoneme-based Distribution Regularization for Speech Enhancement
Yajing Liu
Xiulian Peng
Zhiwei Xiong
Yan Lu
10
4
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
H. Meng
15
47
0
08 Apr 2021
Attention Forcing for Machine Translation
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark J. F. Gales
21
7
0
02 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech:
  Two-stage Sequence-to-Sequence Training
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
Kun Zhou
Berrak Sisman
Haizhou Li
13
27
0
31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
19
81
0
28 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
20
186
0
01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
31
22
0
12 Feb 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural
  Networks
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
17
8
0
03 Jan 2021
GraphPB: Graphical Representations of Prosody Boundary in Speech
  Synthesis
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
13
9
0
03 Dec 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for
  Emotional Speech Synthesis
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
25
55
0
17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
19
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
19
97
0
06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural
  Text-to-Speech
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
20
19
0
04 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
14
217
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
17
102
0
22 Oct 2020
NU-GAN: High resolution neural upsampling with GAN
NU-GAN: High resolution neural upsampling with GAN
Rithesh Kumar
Kundan Kumar
Vicki Anand
Yoshua Bengio
Aaron Courville
16
25
0
22 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on
  Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Shengkui Zhao
Trung Hieu Nguyen
Hao Wang
B. Ma
6
25
0
16 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Wei Ping
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
31
1,387
0
21 Sep 2020
Controllable neural text-to-speech synthesis using intuitive prosodic
  features
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
24
66
0
14 Sep 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based
  Neural Vocoder
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon
Sang-Hoon Lee
Hyeong-Rae Noh
Seong-Whan Lee
18
11
0
16 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
16
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
27
316
0
09 Aug 2020
Xiaomingbot: A Multilingual Robot News Reporter
Xiaomingbot: A Multilingual Robot News Reporter
Runxin Xu
Jun Cao
Mingxuan Wang
Jiaze Chen
Hao Zhou
...
Xiang Yin
Xijin Zhang
Songcheng Jiang
Yuxuan Wang
Lei Li
16
11
0
12 Jul 2020
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
Luka Chkhetiani
Levan Bejanidze
17
1
0
13 Jun 2020
Neural voice cloning with a few low-quality samples
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
17
2
0
12 Jun 2020
Deep generative models for musical audio synthesis
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
19
20
0
10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
19
109
0
08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
45
1,354
0
08 Jun 2020
Previous
123456
Next