Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
193
727
0
05 Jan 2023
Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling
Amitay Sicherman
Yossi Adi
95
37
0
02 Jan 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
80
19
0
29 Dec 2022
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
62
2
0
28 Dec 2022
Source Tracing: Detecting Voice Spoofing
Tinglong Zhu
Xingming Wang
Xiaoyi Qin
Ming Li
65
18
0
16 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Yusuke Yasuda
Tomoki Toda
DiffM
79
8
0
16 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
Tomoki Toda
121
10
0
16 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
75
0
0
15 Dec 2022
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator
Amrutha Prasad
Juan Pablo Zuluaga
P. Motlícek
Seyyed Saeed Sarfjoo
Iuliia Nigmatulina
Karel Veselý
61
3
0
14 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
85
10
0
14 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
74
1
0
11 Dec 2022
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
Rishabh Dabral
Muhammad Hamza Mughal
Vladislav Golyanik
Christian Theobalt
DiffM
VGen
111
183
0
08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
88
16
0
08 Dec 2022
GreenEyes: An Air Quality Evaluating Model based on WaveNet
Kan Huang
Kai Zhang
Ming-de Liu
24
2
0
08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming-Hsuan Yang
Qin Huang
141
27
0
08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Ankur Debnath
Shridevi S Patil
Gangotri Nadiger
R. Ganesan
67
21
0
07 Dec 2022
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
48
1
0
07 Dec 2022
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Daxin Tan
Nikos Kargas
David McHardy
C. Papayiannis
Antonio Bonafonte
Marek Střelec
Jonas Rohnke
A. Filandras
Trevor Wood
53
0
0
07 Dec 2022
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
89
2
0
06 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Jane Polak Scowcroft
DiffM
80
9
0
03 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
104
13
0
30 Nov 2022
Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses
Yang Ai
Zhenhua Ling
64
27
0
29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
83
3
0
26 Nov 2022
Contextual Expressive Text-to-Speech
Jianhong Tu
Zeyu Cui
Xiaohuan Zhou
Siqi Zheng
Kaiqin Hu
Ju Fan
Chang Zhou
51
3
0
26 Nov 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
O. Watts
Lovisa Wihlborg
Cassia Valentini-Botinhao
73
3
0
25 Nov 2022
Efficient Incremental Text-to-Speech on GPUs
Muyang Du
Chuan Liu
Jiaxing Qi
Junjie Lai
52
1
0
25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Xuan Shi
Erica Cooper
Xin Wang
Junichi Yamagishi
Shrikanth Narayanan
69
1
0
25 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs
Harm Lameris
Shivam Mehta
G. Henter
Joakim Gustafson
Éva Székely
66
15
0
24 Nov 2022
3d human motion generation from the text via gesture action classification and the autoregressive model
Gwantae Kim
Youngsuk Ryu
Junyeop Lee
D. Han
Jeongmin Bae
Hanseok Ko
39
2
0
18 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
92
22
0
17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
79
5
0
16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
40
1
0
16 Nov 2022
General Intelligence Requires Rethinking Exploration
Minqi Jiang
Tim Rocktaschel
Edward Grefenstette
LRM
81
20
0
15 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
111
9
0
13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
107
13
0
13 Nov 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
Ya-Jie Zhang
Wei Song
Ya Yue
Zhengchen Zhang
Youzheng Wu
Xiaodong He
64
7
0
11 Nov 2022
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning
Gaku Narita
Junichi Shimizu
Taketo Akama
GAN
82
11
0
10 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
83
6
0
07 Nov 2022
Deliberation Networks and How to Train Them
Qingyun Dou
Mark Gales
63
0
0
06 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
67
2
0
06 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
85
18
0
04 Nov 2022
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts
Detai Xin
Sharath Adavanne
F. Ang
Ashish Kulkarni
Shinnosuke Takamichi
Hiroshi Saruwatari
103
14
0
04 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas
Karolos Nikitaras
Nikolaos Ellinas
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
64
0
0
02 Nov 2022
SpectroMap: Peak detection algorithm for audio fingerprinting
A. López-García
43
0
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
60
1
0
01 Nov 2022
Generating Multilingual Gender-Ambiguous Text-to-Speech Voices
K. Markopoulos
Georgia Maniati
G. Vamvoukakis
Nikolaos Ellinas
Georgios Vardaxoglou
...
Gunu Jho
Inchul Hwang
Aimilios Chalamandaris
Pirros Tsiakoulis
S. Raptis
83
1
0
01 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai
Weiqing Wang
Ming Li
48
28
0
01 Nov 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents
Yongmao Zhang
Zhichao Wang
Pei-Yin Yang
Hongshen Sun
Zhisheng Wang
Linfu Xie
82
6
0
31 Oct 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
77
2
0
31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
Anusha Prakash
H. Murthy
43
7
0
31 Oct 2022
Previous
1
2
3
...
8
9
10
...
24
25
26
Next