Title
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 74 7 0 14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 167 902 0 11 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning Rayhane Mama Marc S. Tyndel Hashiam Kadhim Cole Clifford Ragavan Thurairatnam VGen 112 12 0 08 Jun 2021
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization A. Lahiri Vivek Kwatra C. Frueh J. P. Lewis C. Bregler 3DH 86 102 0 08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 134 175 0 06 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis Shakti Kumar Jithin Pradeep Hussain Zaidi DRL 68 6 0 10 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework Jinyin Chen Linhui Ye Zhaoyan Ming 65 7 0 10 May 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability Rui Liu Berrak Sisman Haizhou Li 69 32 0 03 Apr 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee Kyumin Park Daeyoung Kim 69 32 0 17 Mar 2021
Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks Chitralekha Gupta Purnima Kamath L. Wyse 49 9 0 12 Mar 2021
Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system Noé Tits Kevin El Haddad Thierry Dutoit 69 5 0 06 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech C. Chien Jheng-hao Lin Chien-yu Huang Po-Chun Hsu Hung-yi Lee 119 70 0 06 Mar 2021
Disentangled Sequence Clustering for Human Intention Inference Mark Zolotas Y. Demiris DRL 82 5 0 23 Jan 2021
Hierarchical disentangled representation learning for singing voice conversion Naoya Takahashi M. Singh Yuki Mitsufuji DRL 60 14 0 18 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David Harwath Christopher Song James R. Glass CLIP 90 67 0 31 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling Chen Zhang Yi Ren Xu Tan Jinglin Liu Ke-jun Zhang Tao Qin Sheng Zhao Tie-Yan Liu DiffM 97 38 0 17 Dec 2020
Measuring Disentanglement: A Review of Metrics M. Carbonneau Julian Zaïdi Jonathan Boilard G. Gagnon CoGe DRL 89 85 0 16 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis Neeraj Kumar Srishti Goel Ankur Narang Brejesh Lall 68 5 0 14 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech Yiling Huang Yutian Chen Jason W. Pelecanos Quan Wang 98 12 0 24 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 91 36 0 12 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis Erica Cooper Xin Wang Yi Zhao Yusuke Yasuda Junichi Yamagishi SyDa 50 3 0 10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 116 21 0 08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 99 101 0 06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis Guanghui Xu Wei Song Zhengchen Zhang Chao Zhang Xiaodong He Bowen Zhou 62 50 0 06 Nov 2020
Speech Synthesis and Control Using Differentiable DSP Giorgio Fabbro Vladimir Golkov Thomas Kemp Zorah Lähner 78 12 0 28 Oct 2020
Unsupervised Learning of Disentangled Speech Content and Style Representation Andros Tjandra Ruoming Pang Yu Zhang Shigeki Karita BDL DRL 73 15 0 24 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 96 25 0 23 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 76 103 0 22 Oct 2020
Learning Speaker Embedding from Text-to-Speech Jaejin Cho Piotr Żelasko Jesus Villalba Shinji Watanabe Najim Dehak 66 11 0 21 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS Wen-Chin Huang Tomoki Hayashi Shinji Watanabe Tomoki Toda DRL 81 40 0 06 Oct 2020
Controllable Neural Prosody Synthesis Max Morrison Zeyu Jin Justin Salamon Nicholas J. Bryan G. J. Mysore 57 20 0 07 Aug 2020
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Tomás Nekvinda Ondrej Dusek 72 57 0 03 Aug 2020
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling Hao Hao Tan Dorien Herremans MGen 60 74 0 29 Jul 2020
Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance Hao Hao Tan Yin-Jyun Luo Dorien Herremans 45 8 0 16 Jun 2020
Neural voice cloning with a few low-quality samples Sunghee Jung Hoi-Rim Kim 37 3 0 12 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer Mingjian Chen Xu Tan Yi Ren Jin Xu Hao Sun Sheng Zhao Tao Qin Tie-Yan Liu 65 110 0 08 Jun 2020
MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning Miguel Vasco Francisco S. Melo Ana Paiva DRL 39 11 0 04 Jun 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 41 5 0 21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 91 67 0 18 May 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation Tao Tu Yuan-Jui Chen Alexander H. Liu Hung-yi Lee 54 7 0 16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation A. Laptev Roman Korostik A. Svischev A. Andrusenko Ivan Medennikov S. Rybin 81 61 0 14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 96 121 0 12 May 2020
Jukebox: A Generative Model for Music Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford Ilya Sutskever VLM 171 758 0 30 Apr 2020
The Attacker's Perspective on Automatic Speaker Verification: An Overview Rohan Kumar Das Xiaohai Tian Tomi Kinnunen Haizhou Li AAML 68 80 0 19 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis Ting-Yao Hu A. Shrivastava Oncel Tuzel C. Dhir 57 32 0 09 Mar 2020
Deterministic Decoding for Discrete Data in Variational Autoencoders Daniil Polykovskiy Dmitry Vetrov OffRL 69 8 0 04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 56 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 98 93 0 06 Feb 2020
WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss Rui Liu Berrak Sisman F. Bao Guanglai Gao Haizhou Li 125 14 0 02 Feb 2020
Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion Wen-Chin Huang Hao Luo Hsin-Te Hwang Chen-Chou Lo Yu-Huai Peng Yu Tsao Hsin-Min Wang DRL 63 42 0 22 Jan 2020