ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
A Study of Modeling Rising Intonation in Cantonese Neural Speech
  Synthesis
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
92
4
0
03 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational
  text-to-speech via F0-conditioned data augmentation
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Giulia Comini
Goeric Huybrechts
M. Ribeiro
Adam Gabry's
Jaime Lorenzo-Trueba
67
5
0
29 Jul 2022
Transplantation of Conversational Speaking Style with Interjections in
  Sequence-to-Sequence Speech Synthesis
Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis
Raul Fernandez
David Haws
Guy Lorberbom
Slava Shechtman
A. Sorin
51
10
0
25 Jul 2022
Controllable Data Generation by Deep Learning: A Review
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
99
28
0
19 Jul 2022
GAFX: A General Audio Feature eXtractor
GAFX: A General Audio Feature eXtractor
Zhaoyang Bu
Han Zhang
Xiaohu Zhu
58
0
0
19 Jul 2022
Distance Learner: Incorporating Manifold Prior to Model Training
Distance Learner: Incorporating Manifold Prior to Model Training
Aditya Chetan
Nipun Kwatra
31
1
0
14 Jul 2022
Data Augmentation for Low-Resource Quechua ASR Improvement
Data Augmentation for Low-Resource Quechua ASR Improvement
Rodolfo Zevallos
Núria Bel
Guillermo Cámbara
Mireia Farrús
Jordi Luque
VLMSyDa
21
7
0
14 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
120
201
0
13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
87
10
0
13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning
  to Separate
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate
Nabarun Goswami
Tatsuya Harada
78
5
0
13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in
  Neural TTS
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS
Yookyung Shin
Younggun Lee
Suhee Jo
Yeongtae Hwang
Taesu Kim
100
14
0
13 Jul 2022
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement
  of Neural Post-filter for Low-cost Text-to-speech System
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System
Yi-Chiao Wu
Patrick Lumban Tobing
Kazuki Yasuhara
Noriyuki Matsunaga
Yamato Ohtani
Tomoki Toda
69
0
0
13 Jul 2022
End-to-end speech recognition modeling from de-identified data
End-to-end speech recognition modeling from de-identified data
M. Flechl
Shou-Chun Yin
Junho Park
Peter Skala
44
5
0
12 Jul 2022
Speaker consistency loss and step-wise optimization for semi-supervised
  joint training of TTS and ASR using unpaired text data
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Naoki Makishima
Satoshi Suzuki
Atsushi Ando
Ryo Masumura
246
5
0
11 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial
  Vector-Quantized Auto-Encoders
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Yanqing Liu
Rui Xue
Lei He
Xu Tan
Sheng Zhao
89
25
0
11 Jul 2022
A Comparative Study of Self-supervised Speech Representation Based Voice
  Conversion
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Tomoki Toda
66
17
0
10 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
  Synthesis
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
Yongqiang Wang
Zhou Zhao
95
10
0
08 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African
  speech corpus
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
Josh Meyer
David Ifeoluwa Adelani
Edresson Casanova
A. Oktem
Daniel Whitenack Julian Weber
...
Victor Akinode
Bernard Opoku
S. Olanrewaju
Jesujoba Oluwadara Alabi
Shamsuddeen Hassan Muhammad
43
23
0
07 Jul 2022
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer
  Conditional Adversarial Training
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training
Zewang Zhang
Yibin Zheng
Xinhui Li
Li Lu
DiffM
171
11
0
05 Jul 2022
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot
  Text-To-Speech (TTS)
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)
Ariadna Sánchez
Alessio Falai
Ziyao Zhang
Orazio Angelini
K. Yanagisawa
90
7
0
04 Jul 2022
Mix and Match: An Empirical Study on Training Corpus Composition for
  Polyglot Text-To-Speech (TTS)
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Ziyao Zhang
Alessio Falai
Ariadna Sánchez
Orazio Angelini
K. Yanagisawa
56
4
0
04 Jul 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for
  End-to-End Speech Synthesis
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
101
16
0
04 Jul 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
Keon Lee
Kyumin Park
Daeyoung Kim
LM&MA
115
46
0
03 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice
  Conversion for Noisy Target Speakers
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
Liumeng Xue
Shan Yang
Na Hu
Jane Polak Scowcroft
Linfu Xie
51
2
0
02 Jul 2022
Building African Voices
Building African Voices
Perez Ogayo
Graham Neubig
A. Black
122
15
0
01 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Kyle Kastner
Aaron Courville
57
0
0
30 Jun 2022
Language Model-Based Emotion Prediction Methods for Emotional Speech
  Synthesis Systems
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Hyun-Wook Yoon
Ohsung Kwon
Hoyeon Lee
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
Min-Jae Hwang
128
15
0
30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis
  using ranking support vector machine with variational autoencoder
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song
Ryuichi Yamamoto
Ohsung Kwon
Chan Song
Min-Jae Hwang
Suhyeon Oh
Hyun-Wook Yoon
Jin-Seob Kim
Jae-Min Kim
78
7
0
30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for
  Speech Synthesis based on Disentanglement between Prosody and Timbre
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
Weinan Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
108
27
0
29 Jun 2022
Simple and Effective Multi-sentence TTS with Expressive and Coherent
  Prosody
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Peter Makarov
Ammar Abbas
Mateusz Lajszczak
Arnaud Joly
S. Karlapati
Alexis Moinet
Thomas Drugman
Penny Karanasou
89
16
0
29 Jun 2022
Expressive, Variable, and Controllable Duration Modelling in TTS
Expressive, Variable, and Controllable Duration Modelling in TTS
Ammar Abbas
Thomas Merritt
Alexis Moinet
S. Karlapati
Ewa Muszyñska
Simon Slangen
Elia Gatti
Thomas Drugman
65
10
0
28 Jun 2022
Show Me Your Face, And I'll Tell You How You Speak
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
102
0
0
28 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many
  Fine-Grained Prosody Transfer
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
S. Karlapati
Penny Karanasou
Mateusz Lajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Raymond Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
80
15
0
27 Jun 2022
Attack Agnostic Dataset: Towards Generalization and Stabilization of
  Audio DeepFake Detection
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
AAML
95
23
0
27 Jun 2022
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Wei-Ping Huang
Po-Chun Chen
Sung-Feng Huang
Hung-yi Lee
72
1
0
27 Jun 2022
Detection of Doctored Speech: Towards an End-to-End Parametric
  Learn-able Filter Approach
Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach
Rohit Arora
27
0
0
27 Jun 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Marco Jiralerspong
Gauthier Gidel
VLM
81
3
0
25 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking
  Styles Using Spontaneous Dialogue
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Kentaro Mitsui
Tianyu Zhao
Kei Sawada
Yukiya Hono
Yoshihiko Nankaku
K. Tokuda
67
14
0
24 Jun 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in
  Singing Voice Synthesis
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Tae-Woo Kim
Minguk Kang
Gyeong-Hoon Lee
AAML
174
7
0
23 Jun 2022
A Simple Baseline for Domain Adaptation in End to End ASR Systems Using
  Synthetic Data
A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
Raviraj Joshi
Ashutosh Kumar Singh
100
10
0
22 Jun 2022
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
K. Udagawa
Yuki Saito
Hiroshi Saruwatari
28
6
0
21 Jun 2022
Understanding Robust Learning through the Lens of Representation
  Similarities
Understanding Robust Learning through the Lens of Representation Similarities
Christian Cianfarani
A. Bhagoji
Vikash Sehwag
Ben Y. Zhao
Prateek Mittal
Haitao Zheng
OOD
81
16
0
20 Jun 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis
  Using Linguistic and Prosodic Contexts of Dialogue History
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Yuto Nishimura
Yuki Saito
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
AI4TS
59
8
0
16 Jun 2022
NatiQ: An End-to-end Text-to-Speech System for Arabic
NatiQ: An End-to-end Text-to-Speech System for Arabic
Ahmed Abdelali
Nadir Durrani
C. Demiroğlu
Fahim Dalvi
Hamdy Mubarak
Kareem Darwish
77
14
0
15 Jun 2022
LPCSE: Neural Speech Enhancement through Linear Predictive Coding
LPCSE: Neural Speech Enhancement through Linear Predictive Coding
Yang Liu
Na Tang
Xia Chu
Yang Yang
Jun Wang
68
1
0
14 Jun 2022
Adversarial Audio Synthesis with Complex-valued Polynomial Networks
Adversarial Audio Synthesis with Complex-valued Polynomial Networks
Yongtao Wu
Grigorios G. Chrysos
Volkan Cevher
DiffM
144
4
0
14 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural
  Networks
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao
Zhong-Yu Li
Qi Han
Ming-Ming Cheng
Liang Wang
102
35
0
14 Jun 2022
Multi-instrument Music Synthesis with Spectrogram Diffusion
Multi-instrument Music Synthesis with Spectrogram Diffusion
Curtis Hawthorne
Ian Simon
Adam Roberts
Neil Zeghidour
Josh Gardner
Ethan Manilow
Jesse Engel
DiffM
79
51
0
11 Jun 2022
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel
M. Behr
Fevziye Irem Eyiokur
Dogucan Yaman
Tuan-Nam Nguyen
Carlos Mullov
Mehmet Arif Demirtas
Alperen Kantarci
Stefan Constantin
H. K. Ekenel
CVBM
69
16
0
09 Jun 2022
A New Frontier of AI: On-Device AI Training and Personalization
A New Frontier of AI: On-Device AI Training and Personalization
Jijoong Moon
Parichay Kapoor
Ji Hoon Lee
Donghak Park
Seungbaek Hong
Hyungyu Lee
Donghyeon Jeong
Sungsik Kong
MyungJoo Ham
40
3
0
09 Jun 2022
Previous
123...101112...242526
Next