ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for
  Speech Synthesis
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Manh Luong
Viet-Anh Tran
24
2
0
27 Sep 2021
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice
  Cloning
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning
Rui Li
dong Pu
Minnie Huang
Bill Huang
86
14
0
23 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context
  Prediction Network
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
72
3
0
22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the
  Real World
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
79
40
0
20 Sep 2021
On-device neural speech synthesis
On-device neural speech synthesis
Sivanand Achanta
Albert Antony
L. Golipour
Jiangchuan Li
T. Raitio
...
Francesco Rossi
Jennifer Shi
Jaimin Upadhyay
David Winarsky
Hepeng Zhang
108
17
0
17 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
Changhan Wang
Wei-Ning Hsu
Yossi Adi
Adam Polyak
Ann Lee
Peng-Jen Chen
Jiatao Gu
J. Pino
VLM
106
32
0
14 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
69
47
0
14 Sep 2021
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Chuanxin Tang
Chong Luo
Zhiyuan Zhao
Dacheng Yin
Yucheng Zhao
Wenjun Zeng
66
9
0
12 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with
  low-quality data for expressive speech synthesis
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Songxiang Liu
Shan Yang
Jane Polak Scowcroft
Dong Yu
AI4TS
62
10
0
08 Sep 2021
Benchmarking and challenges in security and privacy for voice biometrics
Benchmarking and challenges in security and privacy for voice biometrics
J. Bonastre
Héctor Delgado
Nicholas W. D. Evans
Tomi Kinnunen
Kong Aik Lee
...
Massimiliano Todisco
N. Tomashenko
Emmanuel Vincent
Xin Wang
Junichi Yamagishi
88
9
0
01 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
102
18
0
30 Aug 2021
Integrated Speech and Gesture Synthesis
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
88
19
0
25 Aug 2021
One TTS Alignment To Rule Them All
One TTS Alignment To Rule Them All
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
81
85
0
23 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing
  Highlight Cues
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues
Junjie H. Xu
Zhou Fang
Qihang Chen
Satoru Ohno
Pujana Paliyawan
42
4
0
18 Aug 2021
Combining speakers of multiple languages to improve quality of neural
  voices
Combining speakers of multiple languages to improve quality of neural voices
Javier Latorre
Charlotte Bailleul
Tuuli H. Morrill
Alistair Conkie
Y. Stylianou
64
8
0
17 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
162
6
0
16 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech
Enhancing audio quality for expressive Neural Text-to-Speech
Abdelhamid Ezzerg
Adam Gabry's
Bartosz Putrycz
Daniel Korzekwa
Daniel Sáez-Trigueros
David McHardy
Kamil Pokora
Jakub Lachowicz
Jaime Lorenzo-Trueba
V. Klimkov
140
6
0
13 Aug 2021
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform
Youxuan Ma
Zongze Ren
Shugong Xu
85
40
0
12 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary
  Person
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
Xinsheng Wang
Qicong Xie
Jihua Zhu
Lei Xie
O. Scharenborg
120
19
0
09 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training
  withTime-Frequency Domain Features
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
101
45
0
06 Aug 2021
An Empirical Study on End-to-End Singing Voice Synthesis with
  Encoder-Decoder Architectures
An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures
Dengfeng Ke
Yuxing Lu
Xudong Liu
Yanyan Xu
Jing Sun
Cheng-Hao Cai
52
0
0
06 Aug 2021
Applying the Information Bottleneck Principle to Prosodic Representation
  Learning
Applying the Information Bottleneck Principle to Prosodic Representation Learning
Guangyan Zhang
Ying Qin
Daxin Tan
Tan Lee
77
4
0
05 Aug 2021
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
Yukiya Hono
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
53
39
0
05 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive
  Speech Synthesis
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
61
21
0
04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For
  Expressive Speech Synthesis
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Xudong Dai
Cheng Gong
Longbiao Wang
Kaili Zhang
46
2
0
04 Aug 2021
Creation and Detection of German Voice Deepfakes
Creation and Detection of German Voice Deepfakes
Vanessa Barnekow
Dominik Binder
Niclas Kromrey
Pascal Munaretto
A. Schaad
Felix Schmieder
23
3
0
02 Aug 2021
End to End Bangla Speech Synthesis
End to End Bangla Speech Synthesis
Prithwiraj Bhattacharjee
Rajan Saha Raju
Arif Ahmad
M. S. Rahman
39
2
0
01 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
57
7
0
01 Aug 2021
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal
  Language
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
Huiyan Li
Haohong Lin
You Wang
Hengyang Wang
Ming Zhang
Han Gao
Qing Ai
Zhiyuan Luo
Guang Li
63
14
0
31 Jul 2021
Practical Attacks on Voice Spoofing Countermeasures
Practical Attacks on Voice Spoofing Countermeasures
Andre Kassis
Urs Hengartner
AAML
49
15
0
30 Jul 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech
  Synthesis
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
92
23
0
27 Jul 2021
Beyond Voice Identity Conversion: Manipulating Voice Attributes by
  Adversarial Learning of Structured Disentangled Representations
Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations
L. Benaroya
Nicolas Obin
Axel Roebel
42
5
0
26 Jul 2021
Adaptation of Tacotron2-based Text-To-Speech for
  Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Csaba Zainkó
L. Tóth
Amin Honarmandi Shandiz
G. Gosztolya
Alexandra Markó
Géza Németh
Tamás Gábor Csapó
66
4
0
26 Jul 2021
Use of speaker recognition approaches for learning and evaluating
  embedding representations of musical instrument sounds
Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds
Xuan Shi
Erica Cooper
Junichi Yamagishi
100
7
0
24 Jul 2021
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Joanna Rownicka
Kilian Sprenkamp
A. Tripiana
Volodymyr Gromoglasov
Timo P. Kunz
26
0
0
21 Jul 2021
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model
Cheng-Hung Hu
Yu-Huai Peng
Junichi Yamagishi
Yu Tsao
Hsin-Min Wang
55
5
0
20 Jul 2021
Human Perception of Audio Deepfakes
Human Perception of Audio Deepfakes
Nicolas Müller
Karla Markert
Konstantin Böttinger
121
50
0
20 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with
  voice preservation
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
105
73
0
19 Jul 2021
Parallel and High-Fidelity Text-to-Lip Generation
Parallel and High-Fidelity Text-to-Lip Generation
Jinglin Liu
Zhiying Zhu
Yi Ren
Wencan Huang
Baoxing Huai
N. Yuan
Zhou Zhao
55
10
0
14 Jul 2021
Extending Text-to-Speech Synthesis with Articulatory Movement Prediction
  using Ultrasound Tongue Imaging
Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging
Tamás Gábor Csapó
41
2
0
12 Jul 2021
Many-to-Many Voice Conversion based Feature Disentanglement using
  Variational Autoencoder
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder
Manh Luong
Viet-Anh Tran
DRL
54
16
0
11 Jul 2021
A Deep-Bayesian Framework for Adaptive Speech Duration Modification
A Deep-Bayesian Framework for Adaptive Speech Duration Modification
Ravi Shankar
A. Venkataraman
45
0
0
11 Jul 2021
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive
  Text-to-Speech Synthesis
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis
Hui Lu
Zhiyong Wu
Xixin Wu
Xu Li
Shiyin Kang
Xunying Liu
Helen Meng
69
12
0
07 Jul 2021
Msdtron: a high-capability multi-speaker speech synthesis system for
  diverse data using characteristic information
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information
Qinghua Wu
Quanbo Shen
Jian Luan
YuJun Wang
72
4
0
07 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference
  and Bidirectional Fusion
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
96
41
0
04 Jul 2021
Supervised Contrastive Learning for Accented Speech Recognition
Supervised Contrastive Learning for Accented Speech Recognition
Tao Han
Hantao Huang
Ziang Yang
Wei Han
66
16
0
02 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at
  IWSLT 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Dan Liu
Mengge Du
Xiaoxi Li
Yuchen Hu
Lirong Dai
99
21
0
01 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures
A Generative Model for Raw Audio Using Transformer Architectures
Prateek Verma
C. Chafe
79
29
0
30 Jun 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
67
0
0
29 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
Previous
123...151617...242526
Next