Title
NewsPod: Automatic and Interactive News Podcasts Philippe Laban Elicia Ye Srujay Korlakunta John F. Canny Marti A. Hearst 54 22 0 15 Feb 2022
Distribution augmentation for low-resource expressive text-to-speech Mateusz Lajszczak Animesh Prasad Arent van Korlaar Bajibabu Bollepalli Antonio Bonafonte ... M. Nicolis Alexis Moinet Thomas Drugman Trevor Wood Elena Sokolova 61 7 0 13 Feb 2022
I'm Hearing (Different) Voices: Anonymous Voices to Protect User Privacy H.C.M. Turner Giulio Lovisotto Simon Eberz Ivan Martinovic 32 1 0 13 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis Hao-Wen Dong Cong Zhou Taylor Berg-Kirkpatrick Julian McAuley 83 17 0 12 Feb 2022
Cross-speaker style transfer for text-to-speech using data augmentation M. Ribeiro Julian Roth Giulia Comini Goeric Huybrechts Adam Gabry's Jaime Lorenzo-Trueba 74 21 0 10 Feb 2022
Building Synthetic Speaker Profiles in Text-to-Speech Systems Jie Pu Yi Meng Oguz H. Elibol 48 2 0 07 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge Ziyi Chen Hua Hua Yuxiang Zhang Ming Li Pengyuan Zhang 102 0 0 29 Jan 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition M. Soleymanpour Michael T. Johnson Rahim Soleymanpour J. Berry 82 30 0 27 Jan 2022
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis Shinnosuke Takamichi Wataru Nakata Naoko Tanji Hiroshi Saruwatari AuLLM 77 7 0 26 Jan 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention Artem Gorodetskii Ivan Ozhiganov 117 2 0 25 Jan 2022
Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals Haohan Guo Zhiping Zhou Fanbo Meng Kai-Chun Liu 100 16 0 25 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 155 18 0 24 Jan 2022
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end Rem Hida Masaki Hamada Chie Kamada E. Tsunoo Toshiyuki Sekiya Toshiyuki Kumakura 34 7 0 24 Jan 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training J. Yang Lei He 93 11 0 20 Jan 2022
MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription Dabiao Ma Yitong Zhang Meng Li Feng Ye 39 1 0 19 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 109 103 0 19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 81 75 0 17 Jan 2022
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics Saida Mussakhojayeva Yerbolat Khassanov H. A. Varol 81 13 0 15 Jan 2022
A Practical Guide to Logical Access Voice Presentation Attack Detection Xin Wang Junichi Yamagishi AAML 109 11 0 10 Jan 2022
Audio representations for deep learning in sound synthesis: A review Anastasia Natsiou Seán O'Leary AI4TS 72 18 0 07 Jan 2022
A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram Anastasia Natsiou Seán O'Leary 44 3 0 07 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion Wendong Gan Bolong Wen Yin Yan Haitao Chen Zhichao Wang Hongqiang Du Lei Xie Kaixuan Guo Hai Li 85 14 0 02 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 86 11 0 23 Dec 2021
Forensic Analysis of Synthetically Generated Western Blot Images S. Mandelli D. Cozzolino E. D. Cannas J. P. Cardenuto Daniel Moreira ... Walter J. Scheirer Anderson de Rezende Rocha L. Verdoliva Stefano Tubaro Edward J. Delp 104 21 0 16 Dec 2021
Textless Speech-to-Speech Translation on Real Data Ann Lee Hongyu Gong Paul-Ambroise Duquenne Holger Schwenk Peng-Jen Chen ... Sravya Popuri Yossi Adi J. Pino Jiatao Gu Wei-Ning Hsu 105 150 0 15 Dec 2021
Generate Point Clouds with Multiscale Details from Graph-Represented Structures Ximing Yang Zhibo Zhang Zhengfu He Cheng Jin 3DPC 53 1 0 13 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 79 23 0 09 Dec 2021
Multi-speaker Emotional Text-to-speech Synthesizer Sungjae Cho Soo-Young Lee 41 1 0 07 Dec 2021
VocBench: A Neural Vocoder Benchmark for Speech Synthesis Ehab A. AlBadawy Andrew Gibiansky Qing He Jilong Wu Ming-Ching Chang Siwei Lyu 61 12 0 06 Dec 2021
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 249 415 0 04 Dec 2021
Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images Sang-gil Lee Eunji Kim J. S. Bae J. H. Kim Sungroh Yoon OOD 73 11 0 02 Dec 2021
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey Zahra Khanjani Gabrielle Watson V. P Janeja 61 27 0 28 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance Heeseung Kim Sungwon Kim Sungroh Yoon DiffM BDL 131 112 0 23 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 60 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 144 9 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 64 4 0 19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 72 18 0 19 Nov 2021
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control K. Markopoulos Nikolaos Ellinas Alexandra Vioni Myrsini Christidou Panos Kakoulidis ... Georgia Maniati June Sig Sung Hyoungmin Park Pirros Tsiakoulis Aimilios Chalamandaris 82 2 0 17 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features Georgia Maniati Nikolaos Ellinas K. Markopoulos G. Vamvoukakis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 70 14 0 17 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Aimilios Chalamandaris Georgia Maniati Panos Kakoulidis S. Raptis June Sig Sung Hyoungmin Park Pirros Tsiakoulis 139 37 0 17 Nov 2021
Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data Zhu Li Yuqing Zhang Mengxi Nie Ming Yan Mengnan He Ruixiong Zhang Caixia Gong 49 3 0 15 Nov 2021
Textless Speech Emotion Conversion using Discrete and Decomposed Representations Felix Kreuk Adam Polyak Jade Copet Eugene Kharitonov Tu Nguyen M. Rivière Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 114 34 0 14 Nov 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning Songxiang Liu Jane Polak Scowcroft Dong Yu 64 10 0 14 Nov 2021
HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and Statistical Analysis Tajuddin Manhar Mohammed L. Nataraj Satish Chikkagoudar S. Chandrasekaran B. S. Manjunath 26 7 0 08 Nov 2021
Speaker Generation Daisy Stanton Matt Shannon Soroosh Mariooryad RJ Skerry-Ryan Eric Battenberg Tom Bagby David Kao 96 30 0 07 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech Sung-Feng Huang Chyi-Jiunn Lin Da-Rong Liu Yi-Chen Chen Hung-yi Lee 126 57 0 07 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 63 17 0 07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 204 131 0 04 Nov 2021
Voice Conversion Can Improve ASR in Very Low-Resource Settings Matthew Baas Herman Kamper 101 17 0 04 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion Benjamin van Niekerk M. Carbonneau Julian Zaïdi Matthew Baas Hugo Seuté Herman Kamper DRL 116 123 0 03 Nov 2021