Title
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 92 4 0 03 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 67 5 0 29 Jul 2022
Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis Raul Fernandez David Haws Guy Lorberbom Slava Shechtman A. Sorin 51 10 0 25 Jul 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 99 28 0 19 Jul 2022
GAFX: A General Audio Feature eXtractor Zhaoyang Bu Han Zhang Xiaohu Zhu 58 0 0 19 Jul 2022
Distance Learner: Incorporating Manifold Prior to Model Training Aditya Chetan Nipun Kwatra 31 1 0 14 Jul 2022
Data Augmentation for Low-Resource Quechua ASR Improvement Rodolfo Zevallos Núria Bel Guillermo Cámbara Mireia Farrús Jordi Luque VLM SyDa 21 7 0 14 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 120 201 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 87 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 78 5 0 13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 100 14 0 13 Jul 2022
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System Yi-Chiao Wu Patrick Lumban Tobing Kazuki Yasuhara Noriyuki Matsunaga Yamato Ohtani Tomoki Toda 69 0 0 13 Jul 2022
End-to-end speech recognition modeling from de-identified data M. Flechl Shou-Chun Yin Junho Park Peter Skala 44 5 0 12 Jul 2022
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data Naoki Makishima Satoshi Suzuki Atsushi Ando Ryo Masumura 246 5 0 11 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu Rui Xue Lei He Xu Tan Sheng Zhao 89 25 0 11 Jul 2022
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion Wen-Chin Huang Shu-Wen Yang Tomoki Hayashi Tomoki Toda 66 17 0 10 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqiang Wang Zhou Zhao 95 10 0 08 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Josh Meyer David Ifeoluwa Adelani Edresson Casanova A. Oktem Daniel Whitenack Julian Weber ... Victor Akinode Bernard Opoku S. Olanrewaju Jesujoba Oluwadara Alabi Shamsuddeen Hassan Muhammad 43 23 0 07 Jul 2022
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training Zewang Zhang Yibin Zheng Xinhui Li Li Lu DiffM 171 11 0 05 Jul 2022
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) Ariadna Sánchez Alessio Falai Ziyao Zhang Orazio Angelini K. Yanagisawa 90 7 0 04 Jul 2022
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS) Ziyao Zhang Alessio Falai Ariadna Sánchez Orazio Angelini K. Yanagisawa 56 4 0 04 Jul 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Ming Jiang Linfu Xie 101 16 0 04 Jul 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech Keon Lee Kyumin Park Daeyoung Kim LM&MA 115 46 0 03 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers Liumeng Xue Shan Yang Na Hu Jane Polak Scowcroft Linfu Xie 51 2 0 02 Jul 2022
Building African Voices Perez Ogayo Graham Neubig A. Black 122 15 0 01 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS Kyle Kastner Aaron Courville 57 0 0 30 Jun 2022
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems Hyun-Wook Yoon Ohsung Kwon Hoyeon Lee Ryuichi Yamamoto Eunwoo Song Jae-Min Kim Min-Jae Hwang 128 15 0 30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder Eunwoo Song Ryuichi Yamamoto Ohsung Kwon Chan Song Min-Jae Hwang Suhyeon Oh Hyun-Wook Yoon Jin-Seob Kim Jae-Min Kim 78 7 0 30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Weinan Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 108 27 0 29 Jun 2022
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody Peter Makarov Ammar Abbas Mateusz Lajszczak Arnaud Joly S. Karlapati Alexis Moinet Thomas Drugman Penny Karanasou 89 16 0 29 Jun 2022
Expressive, Variable, and Controllable Duration Modelling in TTS Ammar Abbas Thomas Merritt Alexis Moinet S. Karlapati Ewa Muszyñska Simon Slangen Elia Gatti Thomas Drugman 65 10 0 28 Jun 2022
Show Me Your Face, And I'll Tell You How You Speak Christen Millerdurai L. A. Khaliq Timon Ulrich CVBM 102 0 0 28 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 80 15 0 27 Jun 2022
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection Piotr Kawa Marcin Plata P. Syga AAML 95 23 0 27 Jun 2022
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding Wei-Ping Huang Po-Chun Chen Sung-Feng Huang Hung-yi Lee 72 1 0 27 Jun 2022
Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach Rohit Arora 27 0 0 27 Jun 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms Marco Jiralerspong Gauthier Gidel VLM 81 3 0 25 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Kentaro Mitsui Tianyu Zhao Kei Sawada Yukiya Hono Yoshihiko Nankaku K. Tokuda 67 14 0 24 Jun 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis Tae-Woo Kim Minguk Kang Gyeong-Hoon Lee AAML 174 7 0 23 Jun 2022
A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data Raviraj Joshi Ashutosh Kumar Singh 100 10 0 22 Jun 2022
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS K. Udagawa Yuki Saito Hiroshi Saruwatari 28 6 0 21 Jun 2022
Understanding Robust Learning through the Lens of Representation Similarities Christian Cianfarani A. Bhagoji Vikash Sehwag Ben Y. Zhao Prateek Mittal Haitao Zheng OOD 81 16 0 20 Jun 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History Yuto Nishimura Yuki Saito Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari AI4TS 59 8 0 16 Jun 2022
NatiQ: An End-to-end Text-to-Speech System for Arabic Ahmed Abdelali Nadir Durrani C. Demiroğlu Fahim Dalvi Hamdy Mubarak Kareem Darwish 77 14 0 15 Jun 2022
LPCSE: Neural Speech Enhancement through Linear Predictive Coding Yang Liu Na Tang Xia Chu Yang Yang Jun Wang 68 1 0 14 Jun 2022
Adversarial Audio Synthesis with Complex-valued Polynomial Networks Yongtao Wu Grigorios G. Chrysos Volkan Cevher DiffM 144 4 0 14 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks Shanghua Gao Zhong-Yu Li Qi Han Ming-Ming Cheng Liang Wang 102 35 0 14 Jun 2022
Multi-instrument Music Synthesis with Spectrogram Diffusion Curtis Hawthorne Ian Simon Adam Roberts Neil Zeghidour Josh Gardner Ethan Manilow Jesse Engel DiffM 79 51 0 11 Jun 2022
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos Alexander Waibel M. Behr Fevziye Irem Eyiokur Dogucan Yaman Tuan-Nam Nguyen Carlos Mullov Mehmet Arif Demirtas Alperen Kantarci Stefan Constantin H. K. Ekenel CVBM 69 16 0 09 Jun 2022
A New Frontier of AI: On-Device AI Training and Personalization Jijoong Moon Parichay Kapoor Ji Hoon Lee Donghak Park Seungbaek Hong Hyungyu Lee Donghyeon Jeong Sungsik Kong MyungJoo Ham 40 3 0 09 Jun 2022