Title
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends S. Latif R. Rana Sara Khalifa Raja Jurdak Junaid Qadir Björn W. Schuller AI4TS 96 82 0 02 Jan 2020
Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders Yin-Jyun Luo Chin-Chen Hsu Kat R. Agres Dorien Herremans DRL 99 47 0 03 Dec 2019
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection Shubhi Tyagi M. Nicolis Jonas Rohnke Thomas Drugman Jaime Lorenzo-Trueba 77 32 0 02 Dec 2019
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech Vatsal Aggarwal Marius Cotescu N. Prateek Jaime Lorenzo-Trueba Roberto Barra-Chicote 93 31 0 28 Nov 2019
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features Siddharth Gururani Kilol Gupta D. Shah Z. Shakeri Jervis Pinto 68 15 0 21 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit Tomoki Hayashi Ryuichi Yamamoto Katsuki Inoue Takenori Yoshimura Shinji Watanabe Tomoki Toda K. Takeda Yu Zhang Xu Tan VLM 93 205 0 24 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 68 48 0 03 Oct 2019
Speech Recognition with Augmented Synthesized Speech Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Ye Jia Pedro J. Moreno Yonghui Wu Zelin Wu 69 128 0 25 Sep 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities Slava Shechtman A. Sorin 56 33 0 23 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis Chengzhu Yu Heng Lu Na Hu Meng Yu Chao Weng ... Deyi Tuo Shiyin Kang Guangzhi Lei Jane Polak Scowcroft Dong Yu CVBM 89 118 0 04 Sep 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu Zhiwen Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg Bhuvana Ramabhadran 76 189 0 09 Jul 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach Noé Tits 40 10 0 05 Jul 2019
Improving Performance of End-to-End ASR on Numeric Sequences Cal Peyser Hao Zhang Tara N. Sainath Zelin Wu AI4TS 63 36 0 01 Jul 2019
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders Yin-Jyun Luo Kat R. Agres Dorien Herremans 103 46 0 19 Jun 2019
Using generative modelling to produce varied intonation for speech synthesis Zack Hodari O. Watts Simon King 67 29 0 10 Jun 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis Eric Battenberg Soroosh Mariooryad Daisy Stanton RJ Skerry-Ryan Matt Shannon David Kao Tom Bagby BDL 107 45 0 08 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain Sean Vasquez M. Lewis DiffM 85 132 0 04 Jun 2019
Non-Autoregressive Neural Text-to-Speech Kainan Peng Ming-Yu Liu Z. Song Kexin Zhao 101 40 0 21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network V. Wan Chun-an Chan Tom Kenter Jakub Vít R. Clark 71 75 0 17 May 2019
Direct speech-to-speech translation with a sequence-to-sequence model Ye Jia Ron J. Weiss Fadi Biadsy Wolfgang Macherey Melvin Johnson Zhiwen Chen Yonghui Wu 101 230 0 12 Apr 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech Heiga Zen Viet Dang R. Clark Yu Zhang Ron J. Weiss Ye Jia Zhiwen Chen Yonghui Wu 164 959 0 05 Apr 2019
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data N. Prateek Mateusz Lajszczak Roberto Barra-Chicote Thomas Drugman Jaime Lorenzo-Trueba Thomas Merritt S. Ronanki Trevor Wood 87 30 0 04 Apr 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis Yanyao Bian Changbin Chen Yongguo Kang Zhenglin Pan 77 46 0 04 Apr 2019
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis Noé Tits Fengna Wang Kevin El Haddad Vincent Pagel Thierry Dutoit DiffM 88 39 0 27 Mar 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling Jonathan Shen Patrick Nguyen Yonghui Wu Zhiwen Chen Mengzhao Chen ... William Chan Shubham Toshniwal Baohua Liao M. Nirschl Pat Rondon VLM 113 211 0 21 Feb 2019
Unsupervised speech representation learning using WaveNet autoencoders J. Chorowski Ron J. Weiss Samy Bengio Aaron van den Oord SSL 76 319 0 25 Jan 2019
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation Ye Jia Melvin Johnson Wolfgang Macherey Ron J. Weiss Yuan Cao Chung-Cheng Chiu Naveen Ari Stella Laurenzo Yonghui Wu 98 163 0 05 Nov 2018
A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes B. Gerazov Gérard Bailly Omar Mohammed Yi Xu Philip N. Garner 63 7 0 22 Jun 2018