Title
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech Vatsal Aggarwal Marius Cotescu N. Prateek Jaime Lorenzo-Trueba Roberto Barra-Chicote 84 31 0 28 Nov 2019
Jejueo Datasets for Machine Translation and Speech Synthesis Kyubyong Park Yo Joong Choe Jiyeon Ham 19 5 0 27 Nov 2019
Neural Percussive Synthesis Parameterised by High-Level Timbral Features António Ramires Pritish Chandna Xavier Favory Emilia Gómez Xavier Serra 69 23 0 25 Nov 2019
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features Siddharth Gururani Kilol Gupta D. Shah Z. Shakeri Jervis Pinto 68 15 0 21 Nov 2019
Emotional Voice Conversion using Multitask Learning with Text-to-speech Tae-Ho Kim Sungjae Cho Shinkook Choi Sejik Park Soo-Young Lee 92 40 0 11 Nov 2019
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis Junjie Pan Xiang Yin Zhiling Zhang Shichao Liu Yang Zhang Zejun Ma Yuxuan Wang 47 27 0 11 Nov 2019
Teacher-Student Training for Robust Tacotron-based TTS Rui Liu Berrak Sisman Jingdong Li F. Bao Guanglai Gao Haizhou Li 109 38 0 07 Nov 2019
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework Mingbo Ma Baigong Zheng Kaibo Liu Renjie Zheng Hairong Liu Kainan Peng Kenneth Church Liang Huang 66 31 0 07 Nov 2019
Emotional speech synthesis with rich and granularized control Seyun Um Sangshin Oh Kyungguen Byun Inseon Jang C. Ahn Hong-Goo Kang 74 90 0 05 Nov 2019
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech Xin Wang Junichi Yamagishi Massimiliano Todisco Héctor Delgado A. Nautsch ... J. Bonastre Avashna Govender S. Ronanki Jing-Xuan Zhang Zhenhua Ling 83 12 0 05 Nov 2019
A comparative study of estimating articulatory movements from phoneme sequences and acoustic features Abhayjeet Singh Aravind Illa P. Ghosh 36 8 0 31 Oct 2019
a novel cross-lingual voice cloning approach with a few text-free samples Xinyong Zhou Hao Che Xiaorui Wang Lei Xie 22 4 0 29 Oct 2019
Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System Juheon Lee Hyeong-Seok Choi Junghyun Koo Kyogu Lee 35 18 0 29 Oct 2019
Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning Alexander H. Liu Tao Tu Hung-yi Lee Lin-Shan Lee SSL 105 50 0 28 Oct 2019
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment Yusuke Yasuda Xin Wang Junichi Yamagishi 21 2 0 28 Oct 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 82 149 0 26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency M. Whitehill Shuang Ma Daniel J. McDuff Yale Song 111 35 0 25 Oct 2019
Learning audio representations via phase prediction Félix de Chaumont Quitry Marco Tagliasacchi Dominik Roblek SSL AI4TS 52 10 0 25 Oct 2019
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 195 821 0 25 Oct 2019
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 142 88 0 24 Oct 2019
Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks Kazuhiro Nakamura Shinji Takaki Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku K. Tokuda 84 19 0 24 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit Tomoki Hayashi Ryuichi Yamamoto Katsuki Inoue Takenori Yoshimura Shinji Watanabe Tomoki Toda K. Takeda Yu Zhang Xu Tan VLM 93 205 0 24 Oct 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis Eric Battenberg RJ Skerry-Ryan Soroosh Mariooryad Daisy Stanton David Kao Matt Shannon Tom Bagby 106 114 0 23 Oct 2019
Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer Merlijn Blaauw J. Bonada 73 55 0 22 Oct 2019
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Kundan Kumar Rithesh Kumar T. Boissière L. Gestin Wei Zhen Teoh Jose M. R. Sotelo A. D. Brébisson Yoshua Bengio Aaron Courville GAN 178 961 0 08 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 68 48 0 03 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training Qingyun Dou Yiting Lu Joshua Efiong Mark Gales 62 6 0 26 Sep 2019
Speech Recognition with Augmented Synthesized Speech Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Ye Jia Pedro J. Moreno Yonghui Wu Zelin Wu 69 128 0 25 Sep 2019
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 309 240 0 25 Sep 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities Slava Shechtman A. Sorin 49 33 0 23 Sep 2019
Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade J. Pino Liezl Puzon Jiatao Gu Xutai Ma Arya D. McCarthy D. Gopinath 25 3 0 14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications Shigeki Karita Nanxin Chen Tomoki Hayashi Takaaki Hori Hirofumi Inaguma ... Ryuichi Yamamoto Xiao-fei Wang Shinji Watanabe Takenori Yoshimura Wangyou Zhang 94 722 0 13 Sep 2019
Preech: A System for Privacy-Preserving Speech Transcription Shimaa Ahmed Amrita Roy Chowdhury Kassem Fawaz P. Ramanathan 127 48 0 09 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis Chengzhu Yu Heng Lu Na Hu Meng Yu Chao Weng ... Deyi Tuo Shiyin Kang Guangzhi Lei Jane Polak Scowcroft Dong Yu CVBM 80 118 0 04 Sep 2019
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments Yusuke Yasuda Xin Wang Junichi Yamagishi 58 8 0 30 Aug 2019
Maximizing Mutual Information for Tacotron Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Jane Polak Scowcroft Dong Yu 86 16 0 30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck Shuang Ma Daniel J. McDuff Yale Song 89 25 0 19 Aug 2019
Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder Yi-Chiao Wu Patrick Lumban Tobing Tomoki Hayashi Kazuhiro Kobayashi Tomoki Toda 128 2 0 21 Jul 2019
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis Yuki Saito Shinnosuke Takamichi Hiroshi Saruwatari 40 10 0 19 Jul 2019
Forward-Backward Decoding for Regularizing End-to-End TTS Yibin Zheng Xi Wang Lei He Shifeng Pan Frank Soong Zhengqi Wen J. Tao 41 13 0 18 Jul 2019
Hierarchical Sequence to Sequence Voice Conversion with Limited Data P. Narayanan Punarjay Chakravarty F. Charette G. Puskorius 53 3 0 15 Jul 2019
Multi-Speaker End-to-End Speech Synthesis Jihyun Park Kexin Zhao Kainan Peng Ming-Yu Liu SyDa 74 19 0 09 Jul 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu Zhiwen Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg Bhuvana Ramabhadran 76 189 0 09 Jul 2019
Speech bandwidth extension with WaveNet Archit Gupta Brendan Shillingford Yannis Assael Thomas C. Walters 60 29 0 05 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural text-to-speech V. Klimkov S. Ronanki Jonas Rohnke Thomas Drugman AI4TS 89 82 0 04 Jul 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features Zexin Cai Yaogen Yang Chuxiong Zhang Xiaoyi Qin Ming Li 66 26 0 03 Jul 2019
Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations Gabriel Meseguer-Brocal Geoffroy Peeters 84 61 0 02 Jul 2019
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation Yi-Chiao Wu Tomoki Hayashi Patrick Lumban Tobing Kazuhiro Kobayashi Tomoki Toda 46 16 0 01 Jul 2019
RUSLAN: Russian Spoken Language Corpus for Speech Synthesis Lenar Gabdrakhmanov Rustem Garaev E. Razinkov 42 10 0 26 Jun 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training Peng Wu Zhenhua Ling Li-Juan Liu Yuan Jiang Hong-Chuan Wu Lirong Dai 88 72 0 26 Jun 2019