Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.11480
Cited By
v1
v2 (latest)
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
25 October 2019
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram"
50 / 464 papers shown
Title
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
50
9
0
24 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
61
7
0
17 Mar 2023
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
Hyun Joon Park
Seok Woo Yang
Jin Sob Kim
Wooseok Shin
S. W. Han
68
20
0
16 Mar 2023
TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge
Yukai Ju
Jun Chen
Shimin Zhang
Shulin He
Wei Rao
Wei-Ping Zhu
Yannan Wang
Tao Yu
Shidong Shang
130
14
0
14 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
94
29
0
03 Mar 2023
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
Shijun Wang
Jón Guðnason
Damian Borth
83
10
0
02 Mar 2023
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
87
2
0
27 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
101
31
0
27 Feb 2023
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nirmesh J. Shah
M. Singh
Naoya Takahashi
N. Onoe
91
15
0
21 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
161
3
0
18 Feb 2023
Hypernetworks build Implicit Neural Representations of Sounds
Filip Szatkowski
Karol J. Piczak
Przemtslaw Spurek
Jacek Tabor
Tomasz Trzciñski
118
11
0
09 Feb 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
47
11
0
22 Jan 2023
Wind Power Scenario Generation Using Graph Convolutional Generative Adversarial Network
Young-Ho Cho
Shaohui Liu
Duehee Lee
Hao Zhu
142
5
0
19 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
Tomoki Toda
121
10
0
16 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
54
4
0
08 Dec 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
O. Watts
Lovisa Wihlborg
Cassia Valentini-Botinhao
73
3
0
25 Nov 2022
AERO: Audio Super Resolution in the Spectral Domain
Moshe Mandel
Or Tal
Yossi Adi
83
26
0
22 Nov 2022
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Takenori Yoshimura
Shinji Takaki
Kazuhiro Nakamura
Keiichiro Oura
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
61
7
0
21 Nov 2022
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
86
14
0
19 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
85
46
0
17 Nov 2022
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation
Chang-Bin Jeon
Hyeongi Moon
Keunwoo Choi
Ben Sangbae Chon
Kyogu Lee
56
5
0
14 Nov 2022
SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
Dorien Herremans
VLM
61
0
0
14 Nov 2022
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Junhyeok Lee
Seungu Han
Hyunjae Cho
Wonbin Jung
51
12
0
08 Nov 2022
Improving performance of real-time full-band blind packet-loss concealment with predictive network
Viet-Anh Nguyen
Anh H. T. Nguyen
Andy W. H. Khong
78
8
0
08 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua Wu
125
0
0
07 Nov 2022
HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks
Filip Szatkowski
Karol J. Piczak
Przemysław Spurek
Jacek Tabor
Tomasz Trzciñski
117
13
0
03 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
97
15
0
02 Nov 2022
SIMD-size aware weight regularization for fast neural vocoding on CPU
Hiroki Kanagawa
Yusuke Ijima
115
0
0
02 Nov 2022
Neural Fourier Shift for Binaural Speech Rendering
Jinkyu Lee
Kyogu Lee
80
8
0
02 Nov 2022
Modelling black-box audio effects with time-varying feature modulation
Marco Comunità
C. Steinmetz
Huy Phan
Joshua D. Reiss
70
14
0
01 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
69
0
0
31 Oct 2022
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit
Ryuichi Yamamoto
Reo Yoneyama
Tomoki Toda
459
12
0
28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
86
11
0
28 Oct 2022
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs
Reo Yoneyama
Ryuichi Yamamoto
Kentaro Tachibana
53
5
0
28 Oct 2022
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
133
674
0
24 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
60
0
0
24 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
88
7
0
23 Oct 2022
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
114
8
0
20 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
Chihiro Watanabe
Hirokazu Kameoka
DRL
112
0
0
20 Oct 2022
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
Xin Wang
Junichi Yamagishi
116
43
0
19 Oct 2022
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion
D. Ma
Lester Phillip Violeta
Kazuhiro Kobayashi
Tomoki Toda
62
8
0
19 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
72
16
0
14 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
84
16
0
12 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
115
57
0
06 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
114
30
0
03 Oct 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
136
309
0
30 Sep 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai
Yuki Saito
K. Udagawa
Hiroshi Saruwatari
AAML
80
1
0
26 Sep 2022
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
29
2
0
22 Sep 2022
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
79
6
0
22 Sep 2022
An Initial study on Birdsong Re-synthesis Using Neural Vocoders
Rhythm Bhatia
Tomi Kinnunen
49
1
0
21 Sep 2022
Previous
1
2
3
4
5
...
8
9
10
Next