Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
104
13
0
31 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
86
11
0
28 Oct 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Nobuyuki Morioka
Heiga Zen
Nanxin Chen
Yu Zhang
Yifan Ding
101
16
0
28 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
130
20
0
27 Oct 2022
Explicit Intensity Control for Accented Text-to-speech
Rui Liu
Haolin Zuo
De Hu
Guanglai Gao
Haizhou Li
103
7
0
27 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
383
8
0
27 Oct 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
95
8
0
26 Oct 2022
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
Chunhui Wang
Chang Zeng
Xing He
47
19
0
26 Oct 2022
Cover Reproducible Steganography via Deep Generative Models
Kejiang Chen
Hang Zhou
Yaofei Wang
Meng Li
Weiming Zhang
Neng H. Yu
DiffM
63
13
0
26 Oct 2022
Multilevel Transformer For Multimodal Emotion Recognition
Junyi He
Meimei Wu
Meng Li
Xiaobo Zhu
Feng Ye
64
6
0
26 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
AI4TS
58
5
0
25 Oct 2022
Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
56
1
0
25 Oct 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using
β
β
β
-VAE
Hui Lu
Disong Wang
Xixin Wu
Zhiyong Wu
Xunying Liu
Helen M. Meng
DRL
115
10
0
25 Oct 2022
Streaming Parrotron for on-device speech-to-speech conversion
Oleg Rybakov
Fadi Biadsy
Xia Zhang
Liyang Jiang
Phoenix Meadowlark
Shivani Agrawal
60
3
0
25 Oct 2022
Perfectly Secure Steganography Using Minimum Entropy Coupling
Christian Schroeder de Witt
Samuel Sokota
J. Zico Kolter
Jakob N. Foerster
Martin Strohmeier
150
37
0
24 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
60
0
0
24 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
90
7
0
23 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
64
1
0
18 Oct 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
119
2
0
14 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
60
14
0
14 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
72
16
0
14 Oct 2022
SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language
Ye-shi Jiang
SLR
36
1
0
13 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
84
16
0
12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
94
6
0
12 Oct 2022
Style-Guided Inference of Transformer for High-resolution Image Synthesis
Jonghwa Yim
Minjae Kim
ViT
103
0
0
11 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
115
57
0
06 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
116
30
0
03 Oct 2022
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang
Shi Zong
Jianbing Zhang
Jiajun Chen
Hongfu Liu
69
5
0
02 Oct 2022
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
Itai Gat
Felix Kreuk
Tu Nguyen
Ann Lee
Jade Copet
Gabriel Synnaeve
Emmanuel Dupoux
Yossi Adi
75
11
0
30 Sep 2022
Facial Landmark Predictions with Applications to Metaverse
Qiaopeng Han
Jun Zhao
Kwok-Yan Lam
CVBM
36
0
0
29 Sep 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai
Yuki Saito
K. Udagawa
Hiroshi Saruwatari
AAML
85
1
0
26 Sep 2022
NWPU-ASLP System for the VoicePrivacy 2022 Challenge
Jixun Yao
Qing Wang
Li Zhang
Pengcheng Guo
Yuhao Liang
Linfu Xie
PICV
80
17
0
24 Sep 2022
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
32
2
0
22 Sep 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Haohan Guo
Fenglong Xie
Frank Soong
Xixin Wu
Helen M. Meng
78
12
0
22 Sep 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Yifan Hu
Pengkai Yin
Rui Liu
F. Bao
Guanglai Gao
40
5
0
22 Sep 2022
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
79
6
0
22 Sep 2022
AutoLV: Automatic Lecture Video Generator
Wen Wang
Yang Song
Sanjay Jha
VGen
135
3
0
19 Sep 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings
M. Rahman
M. Graciarena
Diego Castán
Chris Cobo-Kroenke
Mitchell McLaren
A. Lawson
AAML
76
10
0
15 Sep 2022
Open Challenges in Synthetic Speech Detection
Luca Cuccovillo
Christoforos Papastergiopoulos
Anastasios Vafeiadis
Artem Yaroshchuk
P. Aichroth
K. Votis
Dimitrios Tzovaras
82
29
0
15 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Liumeng Xue
Frank Soong
Shaofei Zhang
Linfu Xie
73
23
0
14 Sep 2022
ConvNeXt Based Neural Network for Audio Anti-Spoofing
Qiaowei Ma
J. Zhong
Yitao Yang
Weiheng Liu
Yingbo Gao
W. W. Ng
AAML
124
6
0
14 Sep 2022
Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset
Michael Chinen
Jan Skoglund
Chandan K. A. Reddy
Alessandro Ragano
Andrew Hines
32
9
0
14 Sep 2022
Sporthesia: Augmenting Sports Videos Using Natural Language
Zhutian Chen
Qisen Yang
Xiao Xie
Johanna Beyer
Haijun Xia
Yingnian Wu
Hanspeter Pfister
DiffM
106
39
0
07 Sep 2022
Read it to me: An emotionally aware Speech Narration Application
Rishibha Bansal
55
0
0
06 Sep 2022
Towards Disentangled Speech Representations
Cal Peyser
Ronny Huang Andrew Rosenberg Tara N. Sainath
M. Picheny
Kyunghyun Cho
DRL
106
7
0
28 Aug 2022
The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation
Youngwoo Yoon
Pieter Wolfert
Taras Kucherenko
Carla Viegas
Teodor Nikolov
Mihail Tsakov
G. Henter
VGen
82
81
0
22 Aug 2022
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale
Aditya Agarwal
Bipasha Sen
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
110
0
0
21 Aug 2022
Visualising Model Training via Vowel Space for Text-To-Speech Systems
Binu Abeysinghe
Jesin James
C. Watson
Felix Marattukalam
54
2
0
21 Aug 2022
Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer
S. Nercessian
101
9
0
15 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Xiang Li
Changhe Song
X. Wei
Zhiyong Wu
Jia Jia
Helen Meng
64
4
0
10 Aug 2022
Previous
1
2
3
...
9
10
11
...
24
25
26
Next