Title
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 92 4 0 03 Aug 2022
Generative Extraction of Audio Classifiers for Speaker Identification Tejumade Afonja Lucas Bourtoule Varun Chandrasekaran Sageev Oore Nicolas Papernot AAML 61 1 0 26 Jul 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 99 28 0 19 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 87 10 0 13 Jul 2022
Automatic Evaluation of Speaker Similarity Kamil Deja Ariadna Sánchez Julian Roth Marius Cotescu 50 6 0 01 Jul 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Kentaro Mitsui Tianyu Zhao Kei Sawada Yukiya Hono Yoshihiko Nankaku K. Tokuda 67 14 0 24 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 112 40 0 30 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech Yongqian Li Cheng Yu Guangzhi Sun Hua Jiang Fanglei Sun Weiqin Zu Ying Wen Yang Yang Jun Wang 53 7 0 09 May 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation Ryo Terashima Ryuichi Yamamoto Eunwoo Song Yuma Shirahata Hyun-Wook Yoon Jae-Min Kim Kentaro Tachibana 52 16 0 21 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jaesung Bae Jinhyeok Yang Taejun Bak Young-Sun Joo DiffM 126 6 0 08 Apr 2022
Into-TTS : Intonation Template Based Prosody Control System Jihwan Lee Joun Yeop Lee Heejin Choi Seongkyu Mun Sangjun Park Jae-Sung Bae Chanwoo Kim 135 4 0 04 Apr 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis Yixuan Zhou Changhe Song Xiang Li Lu Zhang Zhiyong Wu Yanyao Bian Jane Polak Scowcroft Helen Meng 139 23 0 03 Apr 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning Takaaki Saeki Kentaro Tachibana Ryuichi Yamamoto 53 11 0 29 Mar 2022
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity Sungjae Kim Y.E. Kim Jewoo Jun Injung Kim 107 14 0 02 Mar 2022
SpeechPainter: Text-conditioned Speech Inpainting Zalan Borsos Matthew Sharifi Marco Tagliasacchi 93 28 0 15 Feb 2022
Unsupervised word-level prosody tagging for controllable speech synthesis Yiwei Guo Chenpeng Du Kai Yu 67 15 0 15 Feb 2022
Building Synthetic Speaker Profiles in Text-to-Speech Systems Jie Pu Yi Meng Oguz H. Elibol 40 2 0 07 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 155 18 0 24 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 79 75 0 17 Jan 2022
Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion K. Akuzawa Kotaro Onishi Keisuke Takiguchi Kohki Mametani K. Mori BDL DRL 70 7 0 06 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 81 27 0 25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 60 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 144 9 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 62 4 0 19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 72 18 0 19 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features Georgia Maniati Nikolaos Ellinas K. Markopoulos G. Vamvoukakis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 64 14 0 17 Nov 2021
Zero-shot Singing Technique Conversion Brendan O'Connor S. Dixon Georgy Fazekas 58 5 0 16 Nov 2021
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech Mu Li Jonas Rohnke Antonio Bonafonte Mateusz Lajszczak Trevor Wood DRL 96 2 0 24 Oct 2021
Variational Predictive Routing with Nested Subjective Timescales Alexey Zakharov Qinghai Guo Zafeirios Fountas BDL AI4TS 67 9 0 21 Oct 2021
CycleFlow: Purify Information Factors by Cycle Loss Haoran Sun Chen Chen Lantian Li Dong Wang 65 1 0 18 Oct 2021
PixelPyramids: Exact Inference Models from Lossless Image Pyramids Shweta Mahajan Stefan Roth TPM 51 2 0 17 Oct 2021
Emphasis control for parallel neural TTS Shreyas Seshadri T. Raitio D. Castellani Jiangchuan Li 120 11 0 06 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 78 23 0 06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 111 16 0 06 Oct 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World Emily Wenger Max Bronckers Christian Cianfarani Jenna Cryan Angela Sha Haitao Zheng Ben Y. Zhao AAML 79 40 0 20 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit Changhan Wang Wei-Ning Hsu Yossi Adi Adam Polyak Ann Lee Peng-Jen Chen Jiatao Gu J. Pino VLM 106 32 0 14 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis Songxiang Liu Shan Yang Jane Polak Scowcroft Dong Yu AI4TS 62 10 0 08 Sep 2021
Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring Yaman Kumar Singla Avykat Gupta Shaurya Bagga Changyou Chen Balaji Krishnamurthy R. Shah 86 12 0 30 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining Zhehuai Chen Yu Zhang Andrew Rosenberg Bhuvana Ramabhadran Gary Wang Pedro J. Moreno SSL 90 36 0 27 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech Abdelhamid Ezzerg Adam Gabry's Bartosz Putrycz Daniel Korzekwa Daniel Sáez-Trigueros David McHardy Kamil Pokora Jakub Lachowicz Jaime Lorenzo-Trueba V. Klimkov 132 6 0 13 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis Julian Zaïdi Hugo Seuté Benjamin van Niekerk M. Carbonneau 61 21 0 04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis Xudong Dai Cheng Gong Longbiao Wang Kaili Zhang 34 2 0 04 Aug 2021
On Prosody Modeling for ASR+TTS based Voice Conversion Wen-Chin Huang Tomoki Hayashi Xinjian Li Shinji Watanabe Tomoki Toda 73 9 0 20 Jul 2021
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information Qinghua Wu Quanbo Shen Jian Luan YuJun Wang 72 4 0 07 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 67 0 0 29 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak Jaesung Bae Hanbin Bae Young-Ik Kim Hoon-Young Cho 120 17 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 77 3 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 119 9 0 18 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 55 17 0 15 Jun 2021