ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for
  Pronunciation Enhancement
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement
Gyeong-Hoon Lee
Tae-Woo Kim
Hanbin Bae
Min-Ji Lee
Young-Ik Kim
Hoon-Young Cho
VLM
79
20
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
134
37
0
29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech
  Synthesis
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak
Jaesung Bae
Hanbin Bae
Young-Ik Kim
Hoon-Young Cho
120
17
0
29 Jun 2021
Transflower: probabilistic autoregressive dance generation with
  multimodal attention
Transflower: probabilistic autoregressive dance generation with multimodal attention
Guillermo Valle Pérez
G. Henter
Jonas Beskow
A. Holzapfel
Pierre-Yves Oudeyer
Simon Alexanderson
128
43
0
25 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Zhengxi Liu
Y. Qian
DRL
49
10
0
25 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource
  Highly Expressive Speech
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech
Raahil Shah
Kamil Pokora
Abdelhamid Ezzerg
V. Klimkov
Goeric Huybrechts
Bartosz Putrycz
Daniel Korzekwa
Thomas Merritt
64
26
0
24 Jun 2021
Distilling the Knowledge from Conditional Normalizing Flows
Distilling the Knowledge from Conditional Normalizing Flows
Dmitry Baranchuk
Vladimir Aliev
Artem Babenko
BDL
85
2
0
24 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style
  Control
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
77
3
0
21 Jun 2021
Non-native English lexicon creation for bilingual speech synthesis
Non-native English lexicon creation for bilingual speech synthesis
Arun Baby
Pranav Jawale
Saranya Vinnaitherthan
Sumukh Badam
Nagaraj Adiga
Sharath Adavanne
44
8
0
21 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational
  Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Jian Cong
Shan Yang
Lei Xie
Jane Polak Scowcroft
DRL
110
29
0
21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Jane Polak Scowcroft
73
30
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
119
9
0
18 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
99
88
0
17 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
  Text-to-Speech Model
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
66
35
0
17 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based
  Non-Parallel Voice Conversion
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion
Zhichao Wang
Xinyong Zhou
Fengyu Yang
Tao Li
Hongqiang Du
Lei Xie
Wendong Gan
Haitao Chen
Hai Li
65
22
0
16 Jun 2021
Improving the expressiveness of neural vocoding with non-affine
  Normalizing Flows
Improving the expressiveness of neural vocoding with non-affine Normalizing Flows
Adam Gabry's
Yunlong Jiao
V. Klimkov
Daniel Korzekwa
Roberto Barra-Chicote
48
1
0
16 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
D. Mohan
Qinmin Hu
Tian Huey Teh
Alexandra Torresquintero
C. Wallis
Marlene Staib
Lorenzo Foglianti
Jiameng Gao
Simon King
55
17
0
15 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram
  Discriminators for High-Fidelity Waveform Generation
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
116
132
0
15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
74
7
0
14 Jun 2021
HUI-Audio-Corpus-German: A high quality TTS dataset
HUI-Audio-Corpus-German: A high quality TTS dataset
Pascal Puchtler
Johannes Wirth
René Peinl
65
22
0
11 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis
  with Graph-based Multi-modal Context Modeling
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
Helen Meng
Chao Weng
Jane Polak Scowcroft
93
24
0
11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
René Peinl
48
0
0
11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
167
903
0
11 Jun 2021
Improving multi-speaker TTS prosody variance with a residual encoder and
  normalizing flows
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows
Iván Vallés-Pérez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
J. Droppo
102
8
0
10 Jun 2021
NWT: Towards natural audio-to-video generation with representation
  learning
NWT: Towards natural audio-to-video generation with representation learning
Rayhane Mama
Marc S. Tyndel
Hashiam Kadhim
Cole Clifford
Ragavan Thurairatnam
VGen
112
12
0
08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
134
175
0
06 Jun 2021
An objective evaluation of the effects of recording conditions and
  speaker characteristics in multi-speaker deep neural speech synthesis
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis
Beáta Lőrincz
Adriana Stan
M. Giurgiu
33
2
0
03 Jun 2021
Speaker verification-derived loss and data augmentation for DNN-based
  multispeaker speech synthesis
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Beáta Lőrincz
Adriana Stan
M. Giurgiu
45
6
0
03 Jun 2021
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech
  Recognizer And Large Scale Synthetic Data
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data
N. Howard
Alex Park
T. Shabestary
A. Gruenstein
Rohit Prabhavalkar
52
17
0
01 Jun 2021
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and
  Controllable Speech Synthesis
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis
Chenpeng Du
K. Yu
154
20
0
27 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
237
59
0
24 May 2021
High-Fidelity and Low-Latency Universal Neural Vocoder based on
  Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform
  Modeling
High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling
Patrick Lumban Tobing
Tomoki Toda
62
8
0
20 May 2021
Speaker disentanglement in video-to-speech conversion
Speaker disentanglement in video-to-speech conversion
Dan Oneaţă
Adriana Stan
H. Cucu
66
9
0
20 May 2021
Designing AI-based Conversational Agent for Diabetes Care in a
  Multilingual Context
Designing AI-based Conversational Agent for Diabetes Care in a Multilingual Context
Thuy-Trinh Nguyen
Kellie Sim
Anthony To Yiu Kuen
Ronald R. O'Donnell
Suan Tee Lim
Wenru Wang
Hoang D. Nguyen
30
3
0
20 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All
  You Need For Audio Generation
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
157
11
0
17 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
117
544
0
13 May 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Learning Robust Latent Representations for Controllable Speech Synthesis
Shakti Kumar
Jithin Pradeep
Hussain Zaidi
DRL
68
6
0
10 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Jinyin Chen
Linhui Ye
Zhaoyan Ming
65
7
0
10 May 2021
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Jinglin Liu
Chengxi Li
Yi Ren
Feiyang Chen
Zhou Zhao
DiffM
187
271
0
06 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
59
12
0
05 May 2021
Personalized Keyphrase Detection using Speaker and Environment
  Information
Personalized Keyphrase Detection using Speaker and Environment Information
R. Rikhye
Quan Wang
Qiao Liang
Yanzhang He
Ding Zhao
Yiteng Huang
Huang
A. Narayanan
Ian McGraw
54
11
0
28 Apr 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
Maja Pantic
112
47
0
27 Apr 2021
Daily Turking: Designing Longitudinal Daily-task Studies on Mechanical
  Turk
Daily Turking: Designing Longitudinal Daily-task Studies on Mechanical Turk
H.C.M. Turner
Simon Eberz
Ivan Martinovic
6
0
0
26 Apr 2021
Phrase break prediction with bidirectional encoder representations in
  Japanese text-to-speech synthesis
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
Kosuke Futamata
Byeong-Cheol Park
Ryuichi Yamamoto
Kentaro Tachibana
35
14
0
26 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLMALM
94
25
0
20 Apr 2021
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
Saida Mussakhojayeva
Aigerim Janaliyeva
A. Mirzakhmetov
Yerbolat Khassanov
H. A. Varol
61
14
0
17 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
71
9
0
16 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure
  for Expressive Text-to-Speech Synthesis
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
105
6
0
14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion
Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
41
24
0
14 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic
  Speech Recognition Architectures
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
72
12
0
12 Apr 2021
Previous
123...161718...242526
Next