Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06103
Cited By
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"
50 / 491 papers shown
Title
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Yang Xiang
Jesper Lisby Højvang
M. Rasmussen
M. G. Christensen
DRL
21
5
0
16 Nov 2022
Super-resolution Reconstruction of Single Image for Latent features
Xin Wang
Jingkai Yan
Jingyong Cai
Jiankang Deng
Qin Qin
Yao Cheng
DiffM
29
8
0
16 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
11
9
0
13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
29
12
0
13 Nov 2022
Normative Modeling via Conditional Variational Autoencoder and Adversarial Learning to Identify Brain Dysfunction in Alzheimer's Disease
Xuetong Wang
K. Zhao
Rong-Er Zhou
Alex Leow
R. Osorio
Yu Zhang
Lifang He
OOD
CML
6
6
0
13 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
Yoorim Oh
Juheon Lee
Yoseob Han
Kyogu Lee
18
3
0
11 Nov 2022
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Junhyeok Lee
Seungu Han
Hyunjae Cho
Wonbin Jung
19
11
0
08 Nov 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Yongmao Zhang
Heyang Xue
Hanzhao Li
Linfu Xie
Tingwei Guo
Ruixiong Zhang
Caixia Gong
DiffM
VLM
17
28
0
05 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas
Karolos Nikitaras
Nikolaos Ellinas
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
13
0
0
02 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
51
15
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
14
0
0
01 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
18
2
0
31 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
24
15
0
28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
23
10
0
28 Oct 2022
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs
Reo Yoneyama
Ryuichi Yamamoto
Kentaro Tachibana
18
4
0
28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder
Reo Yoneyama
Yi-Chiao Wu
T. Toda
41
26
0
27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
46
96
0
27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Haohan Guo
Fenglong Xie
Xixin Wu
Hui Lu
H. Meng
59
3
0
27 Oct 2022
Improving Speech-to-Speech Translation Through Unlabeled Text
Xuan-Phi Nguyen
Sravya Popuri
Changhan Wang
Yun Tang
Ilia Kulikov
Hongyu Gong
17
9
0
26 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
30
22
0
21 Oct 2022
Towards Relation Extraction From Speech
Tongtong Wu
Guitao Wang
Jinming Zhao
Zhaoran Liu
Guilin Qi
Yuan-Fang Li
Gholamreza Haffari
29
11
0
17 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
25
1
0
14 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
19
12
0
14 Oct 2022
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar
Aolan Sun
Xulong Zhang
Tiandong Ling
Jianzong Wang
Ning Cheng
Jing Xiao
30
4
0
13 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
21
5
0
12 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
Deep Generative Multimedia Children's Literature
Matthew Lyle Olson
13
0
0
27 Sep 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai
Yuki Saito
K. Udagawa
Hiroshi Saruwatari
AAML
17
1
0
26 Sep 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Haohan Guo
Fenglong Xie
Frank Soong
Xixin Wu
Helen M. Meng
37
11
0
22 Sep 2022
Deep Speech Synthesis from Articulatory Representations
Peter Wu
Shinji Watanabe
L. Goldstein
A. Black
Gopala K. Anumanchipalli
31
24
0
13 Sep 2022
Distributional Drift Adaptation with Temporal Conditional Variational Autoencoder for Multivariate Time Series Forecasting
Hui He
Qi Zhang
Kun Yi
Kaize Shi
ZhenDong Niu
Longbin Cao
TTA
AI4TS
16
4
0
01 Sep 2022
Domain Shift-oriented Machine Anomalous Sound Detection Model Based on Self-Supervised Learning
Jinghao Yan
Xin Wang
Qin Wang
Qin Qin
Huan Li
Pengyi Ye
Yue-ping He
Jing Zeng
31
1
0
31 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
16
4
0
03 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
193
0
13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
28
10
0
13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate
Nabarun Goswami
Tatsuya Harada
15
5
0
13 Jul 2022
End-to-end speech recognition modeling from de-identified data
M. Flechl
Shou-Chun Yin
Junho Park
Peter Skala
13
4
0
12 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Yanqing Liu
Rui Xue
Lei He
Xu Tan
Sheng Zhao
21
24
0
11 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
Yongqiang Wang
Zhou Zhao
19
10
0
08 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
Josh Meyer
David Ifeoluwa Adelani
Edresson Casanova
A. Oktem
Daniel Whitenack Julian Weber
...
Victor Akinode
Bernard Opoku
S. Olanrewaju
Jesujoba Oluwadara Alabi
Shamsuddeen Hassan Muhammad
16
21
0
07 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Dan Su
DiffM
45
12
0
05 Jul 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song
Ryuichi Yamamoto
Ohsung Kwon
Chan Song
Min-Jae Hwang
Suhyeon Oh
Hyun-Wook Yoon
Jin-Seob Kim
Jae-Min Kim
35
7
0
30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
W. Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
48
26
0
29 Jun 2022
STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello
Akshat Shrivastava
Daniel Lazar
Po-Chun Hsu
Duc Le
...
Robin Algayres
Tu Nguyen
Emmanuel Dupoux
Luke Zettlemoyer
Abdel-rahman Mohamed
17
35
0
29 Jun 2022
Expressive, Variable, and Controllable Duration Modelling in TTS
Ammar Abbas
Thomas Merritt
Alexis Moinet
S. Karlapati
Ewa Muszyñska
Simon Slangen
Elia Gatti
Thomas Drugman
22
10
0
28 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
S. Karlapati
Penny Karanasou
Mateusz Lajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Raymond Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
14
15
0
27 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Kentaro Mitsui
Tianyu Zhao
Kei Sawada
Yukiya Hono
Yoshihiko Nankaku
K. Tokuda
20
14
0
24 Jun 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
Ziyue Jiang
Zhe Su
Zhou Zhao
Qian Yang
Yi Ren
Jinglin Liu
Zhe Ye
24
4
0
05 Jun 2022
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Kun Song
Heyang Xue
Xinsheng Wang
Jian Cong
Yongmao Zhang
Linfu Xie
Bing Yang
Xiong Zhang
Dan Su
11
5
0
01 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
33
38
0
30 May 2022
Previous
1
2
3
...
10
8
9
Next