Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown

Title
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training Yang Xiang Jesper Lisby Højvang M. Rasmussen M. G. Christensen DRL 21 5 0 16 Nov 2022
Super-resolution Reconstruction of Single Image for Latent features Xin Wang Jingkai Yan Jingyong Cai Jiankang Deng Qin Qin Yao Cheng DiffM 29 8 0 16 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 11 9 0 13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 29 12 0 13 Nov 2022
Normative Modeling via Conditional Variational Autoencoder and Adversarial Learning to Identify Brain Dysfunction in Alzheimer's Disease Xuetong Wang K. Zhao Rong-Er Zhou Alex Leow R. Osorio Yu Zhang Lifang He OOD CML 6 6 0 13 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations Yoorim Oh Juheon Lee Yoseob Han Kyogu Lee 18 3 0 11 Nov 2022
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping Junhyeok Lee Seungu Han Hyunjae Cho Wonbin Jung 19 11 0 08 Nov 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer Yongmao Zhang Heyang Xue Hanzhao Li Linfu Xie Tingwei Guo Ruixiong Zhang Caixia Gong DiffM VLM 17 28 0 05 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis Konstantinos Klapsas Karolos Nikitaras Nikolaos Ellinas June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 13 0 0 02 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP Kun Song Yongmao Zhang Yinjiao Lei Jian Cong Hanzhao Li Linfu Xie Gang He Jinfeng Bai 51 15 0 02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 14 0 0 01 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Georgia Maniati Panos Kakoulidis June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 18 2 0 31 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform Masaya Kawamura Yuma Shirahata Ryuichi Yamamoto Kentaro Tachibana 24 15 0 28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis Yuma Shirahata Ryuichi Yamamoto Eunwoo Song Ryo Terashima Jae-Min Kim Kentaro Tachibana 23 10 0 28 Oct 2022
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs Reo Yoneyama Ryuichi Yamamoto Kentaro Tachibana 18 4 0 28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder Reo Yoneyama Yi-Chiao Wu T. Toda 41 26 0 27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion Jingyi Li Weiping Tu Li Xiao 46 96 0 27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations Haohan Guo Fenglong Xie Xixin Wu Hui Lu H. Meng 59 3 0 27 Oct 2022
Improving Speech-to-Speech Translation Through Unlabeled Text Xuan-Phi Nguyen Sravya Popuri Changhan Wang Yun Tang Ilia Kulikov Hongyu Gong 17 9 0 26 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 30 22 0 21 Oct 2022
Towards Relation Extraction From Speech Tongtong Wu Guitao Wang Jinming Zhao Zhaoran Liu Guilin Qi Yuan-Fang Li Gholamreza Haffari 29 11 0 17 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge Yan Jia Mihee Hong Jingyu Hou Kailong Ren Sifan Ma Jin Wang Fangzhen Peng Yinglin Ji Lin Yang Junjie Wang 25 1 0 14 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario Emily R. Bartusiak Edward J. Delp 19 12 0 14 Oct 2022
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar Aolan Sun Xulong Zhang Tiandong Ling Jianzong Wang Ning Cheng Jing Xiao 30 4 0 13 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 21 5 0 12 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data Ye Zhu Yuehua Wu N. Sebe Yan Yan 33 16 0 05 Oct 2022
Deep Generative Multimedia Children's Literature Matthew Lyle Olson 13 0 0 27 Sep 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech Yusuke Nakai Yuki Saito K. Udagawa Hiroshi Saruwatari AAML 17 1 0 26 Sep 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS Haohan Guo Fenglong Xie Frank Soong Xixin Wu Helen M. Meng 37 11 0 22 Sep 2022
Deep Speech Synthesis from Articulatory Representations Peter Wu Shinji Watanabe L. Goldstein A. Black Gopala K. Anumanchipalli 31 24 0 13 Sep 2022
Distributional Drift Adaptation with Temporal Conditional Variational Autoencoder for Multivariate Time Series Forecasting Hui He Qi Zhang Kun Yi Kaize Shi ZhenDong Niu Longbin Cao TTA AI4TS 16 4 0 01 Sep 2022
Domain Shift-oriented Machine Anomalous Sound Detection Model Based on Self-Supervised Learning Jinghao Yan Xin Wang Qin Wang Qin Qin Huan Li Pengyi Ye Yue-ping He Jing Zeng 31 1 0 31 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 16 4 0 03 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 44 193 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 28 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 15 5 0 13 Jul 2022
End-to-end speech recognition modeling from de-identified data M. Flechl Shou-Chun Yin Junho Park Peter Skala 13 4 0 12 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu Rui Xue Lei He Xu Tan Sheng Zhao 21 24 0 11 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqiang Wang Zhou Zhao 19 10 0 08 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Josh Meyer David Ifeoluwa Adelani Edresson Casanova A. Oktem Daniel Whitenack Julian Weber ... Victor Akinode Bernard Opoku S. Olanrewaju Jesujoba Oluwadara Alabi Shamsuddeen Hassan Muhammad 16 21 0 07 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yinjiao Lei Shan Yang Jian Cong Linfu Xie Dan Su DiffM 45 12 0 05 Jul 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder Eunwoo Song Ryuichi Yamamoto Ohsung Kwon Chan Song Min-Jae Hwang Suhyeon Oh Hyun-Wook Yoon Jin-Seob Kim Jae-Min Kim 35 7 0 30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin W. Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 48 26 0 29 Jun 2022
STOP: A dataset for Spoken Task Oriented Semantic Parsing Paden Tomasello Akshat Shrivastava Daniel Lazar Po-Chun Hsu Duc Le ... Robin Algayres Tu Nguyen Emmanuel Dupoux Luke Zettlemoyer Abdel-rahman Mohamed 17 35 0 29 Jun 2022
Expressive, Variable, and Controllable Duration Modelling in TTS Ammar Abbas Thomas Merritt Alexis Moinet S. Karlapati Ewa Muszyñska Simon Slangen Elia Gatti Thomas Drugman 22 10 0 28 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 14 15 0 27 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Kentaro Mitsui Tianyu Zhao Kei Sawada Yukiya Hono Yoshihiko Nankaku K. Tokuda 20 14 0 24 Jun 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech Ziyue Jiang Zhe Su Zhou Zhao Qian Yang Yi Ren Jinglin Liu Zhe Ye 24 4 0 05 Jun 2022
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation Kun Song Heyang Xue Xinsheng Wang Jian Cong Yongmao Zhang Linfu Xie Bing Yang Xiong Zhang Dan Su 11 5 0 01 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 33 38 0 30 May 2022