Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown

Title
EE-TTS: Emphatic Expressive TTS with Linguistic Information Yifan Zhong Chen Zhang Xule Liu Chenxi Sun Weishan Deng Haifeng Hu Zhongqian Sun 13 3 0 20 May 2023
An Android Robot Head as Embodied Conversational Agent Marcel Heisler C. Becker-Asano LM&Ro LLMAG 29 0 0 18 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs Won Jang D. Lim Heayoung Park 19 1 0 18 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training Zhe Ye Rongjie Huang Yi Ren Ziyue Jiang Jinglin Liu Jinzheng He Xiang Yin Zhou Zhao CLIP 26 20 0 18 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis Jinzheng He Jinglin Liu Zhenhui Ye Rongjie Huang Chenye Cui Huadai Liu Zhou Zhao DiffM 19 19 0 18 May 2023
Using Deepfake Technologies for Word Emphasis Detection Eran Kaufman Lee-Ad Gottlieb 14 0 0 12 May 2023
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation Yu-Kuan Fu Liang-Hsuan Tseng Jiatong Shi Chen An Li Tsung-Yuan Hsu Shinji Watanabe Hung-yi Lee 17 4 0 12 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model Zhe Ye Wei Xue Xuejiao Tan Jie Chen Qi-fei Liu Yi-Ting Guo DiffM 30 40 0 11 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing Jingbei Li Sipan Li Ping Chen Lu Zhang Yi Meng Zhiyong Wu H. Meng Qiao Tian Yuping Wang Yuxuan Wang 32 3 0 09 May 2023
DiffVoice: Text-to-Speech with Latent Diffusion Zhijun Liu Yiwei Guo K. Yu DiffM 22 22 0 23 Apr 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model Jianzong Wang Xulong Zhang Haobin Tang Aolan Sun Ning Cheng Jing Xiao 18 1 0 23 Apr 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers Kai Shen Zeqian Ju Xu Tan Yanqing Liu Yichong Leng Lei He Tao Qin Sheng Zhao Jiang Bian DiffM 15 221 0 18 Apr 2023
Enhancing Speech-to-Speech Translation with Multiple TTS Targets Jiatong Shi Yun Tang Ann Lee H. Inaguma Changhan Wang J. Pino Shinji Watanabe 38 9 0 10 Apr 2023
DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection Amit Kumar Singh Yadav Kratika Bhagtani Ziyue Xiang Paolo Bestagini Stefano Tubaro Edward J. Delp DRL 28 6 0 06 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Shogo Seki 21 9 0 24 Mar 2023
FaceChat: An Emotion-Aware Face-to-face Dialogue Framework Deema Alnuhait Qingyang Wu Zhou Yu 14 7 0 08 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 26 7 0 06 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model Haolin Chen Philip N. Garner DiffM 31 1 0 03 Mar 2023
Leveraging Large Text Corpora for End-to-End Speech Summarization Kohei Matsuura Takanori Ashihara Takafumi Moriya Tomohiro Tanaka A. Ogawa Marc Delcroix Ryo Masumura 27 14 0 02 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations N. Shah Saiteja Kosgi Vishal Tambrahalli Neha Sahipjohn Anil Nelakanti Vineet Gandhi 17 8 0 01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis Ji-Hoon Kim Hongying Yang Yooncheol Ju Il-Hwan Kim Byeong-Yeol Kim 22 8 0 28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui Yukiya Hono Kei Sawada CVBM 11 6 0 28 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow Yoonhyung Lee Jinhyeok Yang Kyomin Jung 17 6 0 27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee Wonbin Jung Hyunjae Cho Jaeyeon Kim Jaehwan Kim 17 3 0 24 Feb 2023
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises Jiasheng Ye Zaixiang Zheng Yu Bao Lihua Qian Mingxuan Wang DiffM 30 44 0 20 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Houjian Guo Chaoran Liu C. Ishi H. Ishiguro BDL 17 12 0 16 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Li-Wei Chen Shinji Watanabe Alexander I. Rudnicky 8 35 0 08 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Dongchao Yang Songxiang Liu Rongjie Huang Chao Weng H. Meng DiffM VLM 31 85 0 31 Jan 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining Takaaki Saeki Soumi Maiti Xinjian Li Shinji Watanabe Shinnosuke Takamichi Hiroshi Saruwatari 32 17 0 30 Jan 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Siddharth Gururani Bryan Catanzaro 16 6 0 24 Jan 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions Yinghao Aaron Li Cong Han Xilin Jiang N. Mesgarani 17 22 0 20 Jan 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes Kimberly T. Mai Sergi D. Bray Toby O. Davies Lewis D. Griffin 37 40 0 19 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion Hao Liu Tao Wang Ruibo Fu Jiangyan Yi Zhengqi Wen J. Tao 18 3 0 10 Jan 2023
Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation Miku Nishihara Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 6 1 0 05 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang Sanyuan Chen Yu-Huan Wu Zi-Hua Zhang Long Zhou ... Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei 43 639 0 05 Jan 2023
Source Tracing: Detecting Voice Spoofing Tinglong Zhu Xingming Wang Xiaoyi Qin Ming Li 24 10 0 16 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder Yusuke Yasuda T. Toda DiffM 15 7 0 16 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 18 0 0 15 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis Yinjiao Lei Shan Yang Xinsheng Wang Qicong Xie Jixun Yao Linfu Xie Dan Su DiffM 13 8 0 03 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech Byoung Jin Choi Myeonghun Jeong Joun Yeop Lee N. Kim 15 12 0 30 Nov 2022
Evaluating and reducing the distance between synthetic and real speech distributions Christoph Minixhofer Ondˇrej Klejch P. Bell 23 7 0 29 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi Erica Cooper Xin Wang Junichi Yamagishi Shrikanth Narayanan 25 1 0 25 Nov 2022
IMaSC -- ICFOSS Malayalam Speech Corpus D. Gopinath K. ThennalD Vrinda V. Nair S. SwarajK G. Sachin AuLLM 18 1 0 23 Nov 2022
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System Takenori Yoshimura Shinji Takaki Kazuhiro Nakamura Keiichiro Oura Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 13 7 0 21 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling Xinfa Zhu Yinjiao Lei Kun Song Yongmao Zhang Tao Li Linfu Xie 11 16 0 19 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 36 18 0 17 Nov 2022
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance Yiwei Guo Chenpeng Du Xie Chen K. Yu DiffM 52 39 0 17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 18 46 0 17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Minki Kang Dong Min Sung Ju Hwang DiffM 25 48 0 17 Nov 2022
Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation Xin Yuan Robin Feng Mingming Ye 14 3 0 17 Nov 2022