Title
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement Gyeong-Hoon Lee Tae-Woo Kim Hanbin Bae Min-Ji Lee Young-Ik Kim Hoon-Young Cho VLM 79 20 0 29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang Jaesung Bae Taejun Bak Young-Ik Kim Hoon-Young Cho 134 37 0 29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak Jaesung Bae Hanbin Bae Young-Ik Kim Hoon-Young Cho 120 17 0 29 Jun 2021
Transflower: probabilistic autoregressive dance generation with multimodal attention Guillermo Valle Pérez G. Henter Jonas Beskow A. Holzapfel Pierre-Yves Oudeyer Simon Alexanderson 128 43 0 25 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition Zhengxi Liu Y. Qian DRL 49 10 0 25 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech Raahil Shah Kamil Pokora Abdelhamid Ezzerg V. Klimkov Goeric Huybrechts Bartosz Putrycz Daniel Korzekwa Thomas Merritt 64 26 0 24 Jun 2021
Distilling the Knowledge from Conditional Normalizing Flows Dmitry Baranchuk Vladimir Aliev Artem Babenko BDL 85 2 0 24 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 77 3 0 21 Jun 2021
Non-native English lexicon creation for bilingual speech synthesis Arun Baby Pranav Jawale Saranya Vinnaitherthan Sumukh Badam Nagaraj Adiga Sharath Adavanne 44 8 0 21 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis Jian Cong Shan Yang Lei Xie Jane Polak Scowcroft DRL 110 29 0 21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Jane Polak Scowcroft 73 30 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 119 9 0 18 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi Najim Dehak William Chan DiffM 99 88 0 17 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Chenye Cui Yi Ren Jinglin Liu Feiyang Chen Rongjie Huang Ming Lei Zhou Zhao 66 35 0 17 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion Zhichao Wang Xinyong Zhou Fengyu Yang Tao Li Hongqiang Du Lei Xie Wendong Gan Haitao Chen Hai Li 65 22 0 16 Jun 2021
Improving the expressiveness of neural vocoding with non-affine Normalizing Flows Adam Gabry's Yunlong Jiao V. Klimkov Daniel Korzekwa Roberto Barra-Chicote 48 1 0 16 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 55 17 0 15 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation Won Jang D. Lim Jaesam Yoon Bongwan Kim Juntae Kim 116 132 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 74 7 0 14 Jun 2021
HUI-Audio-Corpus-German: A high quality TTS dataset Pascal Puchtler Johannes Wirth René Peinl 65 22 0 11 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling Jingbei Li Yi Meng Chenyi Li Zhiyong Wu Helen Meng Chao Weng Jane Polak Scowcroft 93 24 0 11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache René Peinl 48 0 0 11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 167 903 0 11 Jun 2021
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows Iván Vallés-Pérez Julian Roth Grzegorz Beringer Roberto Barra-Chicote J. Droppo 102 8 0 10 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning Rayhane Mama Marc S. Tyndel Hashiam Kadhim Cole Clifford Ragavan Thurairatnam VGen 112 12 0 08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 134 175 0 06 Jun 2021
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis Beáta Lőrincz Adriana Stan M. Giurgiu 33 2 0 03 Jun 2021
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis Beáta Lőrincz Adriana Stan M. Giurgiu 45 6 0 03 Jun 2021
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data N. Howard Alex Park T. Shabestary A. Gruenstein Rohit Prabhavalkar 52 17 0 01 Jun 2021
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis Chenpeng Du K. Yu 154 20 0 27 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review Jabeen Summaira Xi Li Amin Muhammad Shoib Songyuan Li Abdul Jabbar HAI 237 59 0 24 May 2021
High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling Patrick Lumban Tobing Tomoki Toda 62 8 0 20 May 2021
Speaker disentanglement in video-to-speech conversion Dan Oneaţă Adriana Stan H. Cucu 66 9 0 20 May 2021
Designing AI-based Conversational Agent for Diabetes Care in a Multilingual Context Thuy-Trinh Nguyen Kellie Sim Anthony To Yiu Kuen Ronald R. O'Donnell Suan Tee Lim Wenru Wang Hoang D. Nguyen 30 3 0 20 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation Shoule Wu Ziqiang Shi DiffM 157 11 0 17 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech Vadim Popov Ivan Vovk Vladimir Gogoryan Tasnima Sadekova Mikhail Kudinov DiffM 117 544 0 13 May 2021
Learning Robust Latent Representations for Controllable Speech Synthesis Shakti Kumar Jithin Pradeep Hussain Zaidi DRL 68 6 0 10 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework Jinyin Chen Linhui Ye Zhaoyan Ming 65 7 0 10 May 2021
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism Jinglin Liu Chengxi Li Yi Ren Feiyang Chen Zhou Zhao DiffM 187 271 0 06 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space Pol van Rijn Silvan Mertes Dominik Schiller Peter M. C. Harrison P. Larrouy-Maestri Elisabeth André Nori Jacoby 59 12 0 05 May 2021
Personalized Keyphrase Detection using Speaker and Environment Information R. Rikhye Quan Wang Qiao Liang Yanzhang He Ding Zhao Yiteng Huang Huang A. Narayanan Ian McGraw 54 11 0 28 Apr 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira Konstantinos Vougioukas Pingchuan Ma Stavros Petridis Björn W. Schuller Maja Pantic 112 47 0 27 Apr 2021
Daily Turking: Designing Longitudinal Daily-task Studies on Mechanical Turk H.C.M. Turner Simon Eberz Ivan Martinovic 6 0 0 26 Apr 2021
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis Kosuke Futamata Byeong-Cheol Park Ryuichi Yamamoto Kentaro Tachibana 35 14 0 26 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 94 25 0 20 Apr 2021
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset Saida Mussakhojayeva Aigerim Janaliyeva A. Mirzakhmetov Yerbolat Khassanov H. A. Varol 61 14 0 17 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction Stanislav Beliaev Boris Ginsburg 71 9 0 16 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis Yixuan Zhou Changhe Song Jingbei Li Zhiyong Wu Yanyao Bian Jane Polak Scowcroft Helen Meng 105 6 0 14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion Tomoki Hayashi Wen-Chin Huang Kazuhiro Kobayashi Tomoki Toda 41 24 0 14 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures Nick Rossenbach Mohammad Zeineldeen Benedikt Hilmes Ralf Schluter Hermann Ney 72 12 0 12 Apr 2021