Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown

Title
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Shogo Seki 23 4 0 14 Aug 2023
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation Zhichao Wang M. Dai Keld Lundgaard VGen DiffM 43 2 0 12 Aug 2023
Towards an AI to Win Ghana's National Science and Maths Quiz George Boateng Jonathan Abrefah Mensah Kevin Takyi Yeboah William Edor Andrew Kojo Mensah-Onumah Naafi Dasana Ibrahim Nana Sam Yeboah 11 3 0 08 Aug 2023
A Systematic Exploration of Joint-training for Singing Voice Synthesis Yuning Wu Yifeng Yu Jiatong Shi Tao Qian Qin Jin 38 5 0 05 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation Minsu Kim J. Choi Dahun Kim Y. Ro 35 10 0 03 Aug 2023
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis Ramanan Sivaguru Vasista Sai Lodagala S. Umesh 14 2 0 02 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training H. Oh Sang-Hoon Lee Seong-Whan Lee DiffM 15 14 0 31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design Jungil Kong Jihoon Park Beomjeong Kim Jeongmin Kim Dohee Kong Sangjin Kim 29 35 0 31 Jul 2023
Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation Yuan-Ping Chen 16 1 0 30 Jul 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer Sang-Hoon Lee Haram Choi H. Oh Seong-Whan Lee BDL 23 9 0 30 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Ziyue Jiang Jinglin Liu Yi Ren Jinzheng He Zhe Ye ... Pengfei Wei Chunfeng Wang Xiang Yin Zejun Ma Zhou Zhao 33 44 0 14 Jul 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task Kun Song Yinjiao Lei Pei-Ning Chen Yiqing Cao Kun Wei Yongmao Zhang Linfu Xie Ning Jiang Guoqing Zhao 27 1 0 10 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature Review J. Barnett 16 25 0 07 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading Yujia Xiao Shaofei Zhang Xi Wang Xuejiao Tan Lei He Sheng Zhao Frank Soong Tan Lee 17 5 0 03 Jul 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech Daria Diatlova V. Shutov 26 7 0 28 Jun 2023
Two-Stage Voice Anonymization for Enhanced Privacy F. Nespoli Daniel Barreda Joerg Bitzer Patrick A. Naylor 19 3 0 28 Jun 2023
The Singing Voice Conversion Challenge 2023 Wen-Chin Huang Lester Phillip Violeta Songxiang Liu Jiatong Shi T. Toda 16 46 0 26 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech Sen Liu Yiwei Guo Chenpeng Du Xie Chen Kai Yu 24 6 0 25 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Matt Le Apoorv Vyas Bowen Shi Brian Karrer Leda Sari ... Mary Williamson Vimal Manohar Yossi Adi Jay Mahadeokar Wei-Ning Hsu AuLLM 28 264 0 23 Jun 2023
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer Ammar Abbas S. Karlapati Bastian Schnell Penny Karanasou M. G. Moya Amith Nagaraj Ayman Boustati Nicole Peinelt Alexis Moinet Thomas Drugman 25 3 0 20 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning J. Yoon Sunghwan Ahn Hyeon Seung Lee Minchan Kim Seokhwan Kim N. Kim VLM 25 2 0 14 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka Yusuke Ijima Taichi Asami Marc Delcroix Yukinori Honma SSL ELM 27 11 0 14 Jun 2023
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects Xinghua Qu Hongyang Liu Zhu Sun Xiang Yin Yew-Soon Ong Lu Lu Zejun Ma 29 3 0 14 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Yinghao Aaron Li Cong Han Vinay S. Raghavan Gavin Mischler N. Mesgarani VLM DiffM 37 107 0 13 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling Ji-Sang Hwang Sang-Hoon Lee Seong-Whan Lee 16 4 0 13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models Ji-Sang Hwang Sang-Hoon Lee Seong-Whan Lee DiffM 30 8 0 12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN Rithesh Kumar Prem Seetharaman Alejandro Luebs I. Kumar Kundan Kumar 33 282 0 11 Jun 2023
KIT's Multilingual Speech Translation System for IWSLT 2023 Danni Liu Thai-Binh Nguyen Sai Koneru Enes Yavuz Ugan Ngoc-Quan Pham Tuan-Nam Nguyen Tu Anh Dinh Carlos Mullov A. Waibel J. Niehues 18 6 0 08 Jun 2023
VIFS: An End-to-End Variational Inference for Foley Sound Synthesis Junhyeok Lee Hyeonuk Nam Yong-Hwa Park 11 4 0 08 Jun 2023
FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator Massa Baali Ahmed M. Ali 14 1 0 07 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge Wenhao Guan Tao Li Yishuang Li Hukai Huang Q. Hong Lin Li DiffM 27 6 0 07 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization Kohei Matsuura Takanori Ashihara Takafumi Moriya Tomohiro Tanaka Takatomo Kano A. Ogawa Marc Delcroix 29 9 0 07 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias Ziyue Jiang Yi Ren Zhe Ye Jinglin Liu Chen Zhang ... Rongjie Huang Chunfeng Wang Xiang Yin Zejun Ma Zhou Zhao DiffM 32 73 0 06 Jun 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis Zhe Ye Ziyue Jiang Yi Ren Jinglin Liu Chen Zhang Xiang Yin Zejun Ma Zhou Zhao 40 4 0 06 Jun 2023
PolyVoice: Language Models for Speech to Speech Translation Qianqian Dong Zhiying Huang Qiao Tian Chen Xu Tom Ko ... Lu Lu Zejun Ma Yuping Wang Mingxuan Wang Yuxuan Wang 20 23 0 05 Jun 2023
Coupled Variational Autoencoder Xiaoran Hao Patrick Shafto BDL DRL 19 4 0 05 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality Fabian Kögel Bac Nguyen Fabien Cardinaux 14 2 0 02 Jun 2023
Text-to-Speech Pipeline for Swiss German -- A comparison Tobias Bollinger Jan Deriu Manfred Vogel DiffM 16 0 0 31 May 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech L. T. Nguyen Thinh-Le-Gia Pham Dat Quoc Nguyen 24 13 0 31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer Yerin Choi M. Koo 25 0 0 31 May 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions Guanghou Liu Yongmao Zhang Yinjiao Lei Yunlin Chen Rui Wang Zhifei Li Linfu Xie 16 37 0 31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation Rongjie Huang Chunlei Zhang Yongqiang Wang Dongchao Yang Lu Liu Zhenhui Ye Ziyue Jiang Chao Weng Zhou Zhao Dong Yu DiffM 29 26 0 30 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS Sewade Ogun Vincent Colotte Emmanuel Vincent DiffM 27 4 0 28 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis Seong-Hyun Park Bohyung Kim Tae-Hyun Oh 32 1 0 26 May 2023
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion Haram Choi Sang-Hoon Lee Seong-Whan Lee DiffM 13 26 0 25 May 2023
EfficientSpeech: An On-Device Text to Speech Model Rowel Atienza 23 4 0 23 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models Minki Kang Wooseok Han S. Hwang Eunho Yang DiffM 23 16 0 23 May 2023
ADD 2023: the Second Audio Deepfake Detection Challenge Jiangyan Yi Jianhua Tao Ruibo Fu Xinrui Yan Chenglong Wang ... Zhengqi Wen Shan Liang Zheng Lian Shuai Nie Haizhou Li 84 94 0 23 May 2023
Scaling Speech Technology to 1,000+ Languages Vineel Pratap Andros Tjandra Bowen Shi Paden Tomasello Arun Babu ... Yossi Adi Xiaohui Zhang Wei-Ning Hsu Alexis Conneau Michael Auli VLM 77 298 0 22 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer Huadai Liu Rongjie Huang Xuan Lin Wenqiang Xu Maozong Zheng Hong Chen Jinzheng He Zhou Zhao DiffM 31 20 0 22 May 2023