ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
Multi-band MelGAN: Faster Waveform Generation for High-Quality
  Text-to-Speech
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
153
200
0
11 May 2020
GACELA -- A generative adversarial context encoder for long audio
  inpainting
GACELA -- A generative adversarial context encoder for long audio inpainting
Andrés Marafioti
P. Majdak
Nicki Holighaus
Nathanael Perraudin
100
46
0
11 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep
  Transfer with Feedback Constraint
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
73
42
0
10 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice
  Conversion without Parallel Data
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
Seung-won Park
Doo-young Kim
Myun-chul Joe
84
42
0
07 May 2020
AutoSpeech: Neural Architecture Search for Speaker Recognition
AutoSpeech: Neural Architecture Search for Speaker Recognition
Shaojin Ding
Tianlong Chen
Xinyu Gong
Weiwei Zha
Zhangyang Wang
72
57
0
07 May 2020
Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
171
758
0
30 Apr 2020
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural
  Text-to-Speech
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech
S. Karlapati
Alexis Moinet
Arnaud Joly
V. Klimkov
Daniel Sáez-Trigueros
Thomas Drugman
44
67
0
30 Apr 2020
Conditional Spoken Digit Generation with StyleGAN
Conditional Spoken Digit Generation with StyleGAN
Kasperi Palkama
Lauri Juvela
Alexander Ilin
GAN
61
10
0
28 Apr 2020
Adversarial Feature Learning and Unsupervised Clustering based Speech
  Synthesis for Found Data with Acoustic and Textual Noise
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise
Shan Yang
Yuxuan Wang
Lei Xie
66
10
0
28 Apr 2020
ByteSing: A Chinese Singing Voice Synthesis System Using Duration
  Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
Yu Gu
Xiang Yin
Yonghui Rao
Yuan Wan
Benlai Tang
Yang Zhang
Jitong Chen
Yuxuan Wang
Zejun Ma
91
70
0
23 Apr 2020
Utterance-level Sequential Modeling For Deep Gaussian Process Based
  Speech Synthesis Using Simple Recurrent Unit
Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Tomoki Koriyama
Hiroshi Saruwatari
BDL
64
5
0
22 Apr 2020
A Study of Non-autoregressive Model for Sequence Generation
A Study of Non-autoregressive Model for Sequence Generation
Yi Ren
Jinglin Liu
Xu Tan
Zhou Zhao
Sheng Zhao
Tie-Yan Liu
109
62
0
22 Apr 2020
ESPnet-ST: All-in-One Speech Translation Toolkit
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma
Shun Kiyono
Kevin Duh
Shigeki Karita
Nelson Yalta
Tomoki Hayashi
Shinji Watanabe
118
166
0
21 Apr 2020
Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech
  System
Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System
V. Phung
Phan Huy Kinh
Anh-Tuan Dinh
Quoc Bao Nguyen
35
5
0
20 Apr 2020
F0-consistent many-to-many non-parallel voice conversion via conditional
  autoencoder
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder
Kaizhi Qian
Zeyu Jin
M. Hasegawa-Johnson
G. J. Mysore
82
107
0
15 Apr 2020
Generating Multilingual Voices Using Speaker Space Translation Based on
  Bilingual Speaker Data
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Soumi Maiti
Erik Marchi
Alistair Conkie
64
18
0
10 Apr 2020
Scalable Multilingual Frontend for TTS
Scalable Multilingual Frontend for TTS
Alistair Conkie
A. Finch
29
13
0
10 Apr 2020
Improving Readability for Automatic Speech Recognition Transcription
Improving Readability for Automatic Speech Recognition Transcription
Junwei Liao
Sefik Emre Eskimez
Liyang Lu
Yu Shi
Ming Gong
Linjun Shou
Hong Qu
Michael Zeng
67
56
0
09 Apr 2020
Vocoder-Based Speech Synthesis from Silent Videos
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
90
32
0
06 Apr 2020
Improving Perceptual Quality of Drum Transcription with the Expanded
  Groove MIDI Dataset
Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset
Lee F. Callender
Curtis Hawthorne
Jesse Engel
107
21
0
01 Apr 2020
Speech Quality Factors for Traditional and Neural-Based Low Bit Rate
  Vocoders
Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders
Wissam A. Jassim
Jan Skoglund
Michael Chinen
Andrew Hines
19
8
0
26 Mar 2020
Unsupervised Style and Content Separation by Minimizing Mutual
  Information for Speech Synthesis
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
57
32
0
09 Mar 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
75
56
0
04 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech
GraphTTS: graph-to-sequence modelling in neural text-to-speech
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Jing Xiao
52
21
0
04 Mar 2020
Semi-supervised learning of glottal pulse positions in a neural
  analysis-synthesis framework
Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework
F. Bous
Luc Ardaillon
Axel Roebel
18
1
0
02 Mar 2020
Towards Automatic Face-to-Face Translation
Towards Automatic Face-to-Face Translation
Prajwal K R
Rudrabha Mukhopadhyay
Jerin Philip
Abhishek Jha
Vinay P. Namboodiri
C. V. Jawahar
CVBM
112
177
0
01 Mar 2020
Introduction to deep learning
Introduction to deep learning
Lihi Shiloh-Perl
Raja Giryes
67
0
0
29 Feb 2020
Semi-Supervised Neural Architecture Search
Semi-Supervised Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Enhong Chen
Tie-Yan Liu
99
90
0
24 Feb 2020
Interactive Text-to-Speech System via Joint Style Analysis
Interactive Text-to-Speech System via Joint Style Analysis
Yang Gao
Weiyi Zheng
Zhaojun Yang
Thilo Köhler
Christian Fuegen
Qing He
68
11
0
17 Feb 2020
Content Based Singing Voice Extraction From a Musical Mixture
Content Based Singing Voice Extraction From a Musical Mixture
Pritish Chandna
Merlijn Blaauw
J. Bonada
E. Gómez
89
14
0
12 Feb 2020
FastWave: Accelerating Autoregressive Convolutional Neural Networks on
  FPGA
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA
Shehzeen Samarah Hussain
Mojan Javaheripi
Paarth Neekhara
Ryan Kastner
F. Koushanfar
50
21
0
09 Feb 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
56
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
98
93
0
06 Feb 2020
Vocoder-free End-to-End Voice Conversion with Transformer Network
Vocoder-free End-to-End Voice Conversion with Transformer Network
June-Woo Kim
H. Jung
Minho Lee
45
4
0
05 Feb 2020
WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss
WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss
Rui Liu
Berrak Sisman
F. Bao
Guanglai Gao
Haizhou Li
125
14
0
02 Feb 2020
Training Keyword Spotters with Limited and Synthesized Speech Data
Training Keyword Spotters with Limited and Synthesized Speech Data
James Lin
Kevin Kilgour
Dominik Roblek
Matthew Sharifi
63
58
0
31 Jan 2020
SqueezeWave: Extremely Lightweight Vocoders for On-device Speech
  Synthesis
SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis
Bohan Zhai
Tianren Gao
Flora Xue
D. Rothchild
Bichen Wu
Joseph E. Gonzalez
Kurt Keutzer
64
27
0
16 Jan 2020
Unsupervised Audiovisual Synthesis via Exemplar Autoencoders
Unsupervised Audiovisual Synthesis via Exemplar Autoencoders
Kangle Deng
Aayush Bansal
Deva Ramanan
SSLVGen
74
12
0
13 Jan 2020
Advbox: a toolbox to generate adversarial examples that fool neural
  networks
Advbox: a toolbox to generate adversarial examples that fool neural networks
Dou Goodman
Xin Hao
Yang Wang
Yuesheng Wu
Junfeng Xiong
Huan Zhang
AAML
138
55
0
13 Jan 2020
Mel-spectrogram augmentation for sequence to sequence voice conversion
Mel-spectrogram augmentation for sequence to sequence voice conversion
Yeongtae Hwang
Hyemin Cho
Hongsun Yang
Dong-Ok Won
Insoo Oh
Seong-Whan Lee
65
15
0
06 Jan 2020
Synthesising Expressiveness in Peking Opera via Duration Informed
  Attention Network
Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network
Yusong Wu
Shengchen Li
Chengzhu Yu
Heng Lu
Chao Weng
Liqiang Zhang
Dong Yu
51
5
0
27 Dec 2019
Score and Lyrics-Free Singing Voice Generation
Score and Lyrics-Free Singing Voice Generation
Jen-Yu Liu
Yu-Hua Chen
Yin-Cheng Yeh
Yi-Hsuan Yang
70
22
0
26 Dec 2019
Probing the phonetic and phonological knowledge of tones in Mandarin TTS
  models
Probing the phonetic and phonological knowledge of tones in Mandarin TTS models
Jian Zhu
62
8
0
23 Dec 2019
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
95
84
0
19 Dec 2019
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using
  Transformer with Text-to-Speech Pretraining
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
Tomoki Toda
69
99
0
14 Dec 2019
Singing Synthesis: with a little help from my attention
Singing Synthesis: with a little help from my attention
Orazio Angelini
Alexis Moinet
K. Yanagisawa
Thomas Drugman
61
17
0
12 Dec 2019
Towards Robust Neural Vocoding for Speech Generation: A Survey
Towards Robust Neural Vocoding for Speech Generation: A Survey
Po-Chun Hsu
Chun-hsuan Wang
Andy T. Liu
Hung-yi Lee
OOD
78
25
0
05 Dec 2019
WaveFlow: A Compact Flow-based Model for Raw Audio
WaveFlow: A Compact Flow-based Model for Raw Audio
Ming-Yu Liu
Kainan Peng
Kexin Zhao
Z. Song
102
117
0
03 Dec 2019
High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram
High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram
Leyuan Sheng
Dong-Yan Huang
Evgeny Nikolaevich Pavlovskiy
82
15
0
03 Dec 2019
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven
  Acoustic Embedding Selection
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection
Shubhi Tyagi
M. Nicolis
Jonas Rohnke
Thomas Drugman
Jaime Lorenzo-Trueba
77
32
0
02 Dec 2019
Previous
123...212223242526
Next