Title
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis Manh Luong Viet-Anh Tran 24 2 0 27 Sep 2021
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning Rui Li dong Pu Minnie Huang Bill Huang 86 14 0 23 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 72 3 0 22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World Emily Wenger Max Bronckers Christian Cianfarani Jenna Cryan Angela Sha Haitao Zheng Ben Y. Zhao AAML 79 40 0 20 Sep 2021
On-device neural speech synthesis Sivanand Achanta Albert Antony L. Golipour Jiangchuan Li T. Raitio ... Francesco Rossi Jennifer Shi Jaimin Upadhyay David Winarsky Hepeng Zhang 108 17 0 17 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit Changhan Wang Wei-Ning Hsu Yossi Adi Adam Polyak Ann Lee Peng-Jen Chen Jiatao Gu J. Pino VLM 106 32 0 14 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 69 47 0 14 Sep 2021
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration Chuanxin Tang Chong Luo Zhiyuan Zhao Dacheng Yin Yucheng Zhao Wenjun Zeng 66 9 0 12 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis Songxiang Liu Shan Yang Jane Polak Scowcroft Dong Yu AI4TS 62 10 0 08 Sep 2021
Benchmarking and challenges in security and privacy for voice biometrics J. Bonastre Héctor Delgado Nicholas W. D. Evans Tomi Kinnunen Kong Aik Lee ... Massimiliano Todisco N. Tomashenko Emmanuel Vincent Xin Wang Junichi Yamagishi 88 9 0 01 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta Éva Székely Jonas Beskow G. Henter 102 18 0 30 Aug 2021
Integrated Speech and Gesture Synthesis Siyang Wang Simon Alexanderson Joakim Gustafson Jonas Beskow G. Henter Éva Székely 88 19 0 25 Aug 2021
One TTS Alignment To Rule Them All Rohan Badlani A. Lancucki Kevin J. Shih Rafael Valle Ming-Yu Liu Bryan Catanzaro 81 85 0 23 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues Junjie H. Xu Zhou Fang Qihang Chen Satoru Ohno Pujana Paliyawan 42 4 0 18 Aug 2021
Combining speakers of multiple languages to improve quality of neural voices Javier Latorre Charlotte Bailleul Tuuli H. Morrill Alistair Conkie Y. Stylianou 64 8 0 17 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints Ji-Hoon Kim Sang-Hoon Lee Ji-Hyun Lee Hong G Jung Seong-Whan Lee 162 6 0 16 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech Abdelhamid Ezzerg Adam Gabry's Bartosz Putrycz Daniel Korzekwa Daniel Sáez-Trigueros David McHardy Kamil Pokora Jakub Lachowicz Jaime Lorenzo-Trueba V. Klimkov 140 6 0 13 Aug 2021
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform Youxuan Ma Zongze Ren Shugong Xu 85 40 0 12 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang Qicong Xie Jihua Zhu Lei Xie O. Scharenborg 120 19 0 09 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features Gwantae Kim D. Han Hanseok Ko 101 45 0 06 Aug 2021
An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures Dengfeng Ke Yuxing Lu Xudong Liu Yanyan Xu Jing Sun Cheng-Hao Cai 52 0 0 06 Aug 2021
Applying the Information Bottleneck Principle to Prosodic Representation Learning Guangyan Zhang Ying Qin Daxin Tan Tan Lee 77 4 0 05 Aug 2021
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System Yukiya Hono Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku K. Tokuda 53 39 0 05 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis Julian Zaïdi Hugo Seuté Benjamin van Niekerk M. Carbonneau 61 21 0 04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis Xudong Dai Cheng Gong Longbiao Wang Kaili Zhang 46 2 0 04 Aug 2021
Creation and Detection of German Voice Deepfakes Vanessa Barnekow Dominik Binder Niclas Kromrey Pascal Munaretto A. Schaad Felix Schmieder 23 3 0 02 Aug 2021
End to End Bangla Speech Synthesis Prithwiraj Bhattacharjee Rajan Saha Raju Arif Ahmad M. S. Rahman 39 2 0 01 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing Zhaofeng Shi 57 7 0 01 Aug 2021
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language Huiyan Li Haohong Lin You Wang Hengyang Wang Ming Zhang Han Gao Qing Ai Zhiyuan Luo Guang Li 63 14 0 31 Jul 2021
Practical Attacks on Voice Spoofing Countermeasures Andre Kassis Urs Hengartner AAML 49 15 0 30 Jul 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 92 23 0 27 Jul 2021
Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations L. Benaroya Nicolas Obin Axel Roebel 42 5 0 26 Jul 2021
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging Csaba Zainkó L. Tóth Amin Honarmandi Shandiz G. Gosztolya Alexandra Markó Géza Németh Tamás Gábor Csapó 66 4 0 26 Jul 2021
Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds Xuan Shi Erica Cooper Junichi Yamagishi 100 7 0 24 Jul 2021
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI Joanna Rownicka Kilian Sprenkamp A. Tripiana Volodymyr Gromoglasov Timo P. Kunz 26 0 0 21 Jul 2021
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model Cheng-Hung Hu Yu-Huai Peng Junichi Yamagishi Yu Tsao Hsin-Min Wang 55 5 0 20 Jul 2021
Human Perception of Audio Deepfakes Nicolas Müller Karla Markert Konstantin Böttinger 121 50 0 20 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Ye Jia Michelle Tadmor Ramanovich Tal Remez Roi Pomerantz 105 73 0 19 Jul 2021
Parallel and High-Fidelity Text-to-Lip Generation Jinglin Liu Zhiying Zhu Yi Ren Wencan Huang Baoxing Huai N. Yuan Zhou Zhao 55 10 0 14 Jul 2021
Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging Tamás Gábor Csapó 41 2 0 12 Jul 2021
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder Manh Luong Viet-Anh Tran DRL 54 16 0 11 Jul 2021
A Deep-Bayesian Framework for Adaptive Speech Duration Modification Ravi Shankar A. Venkataraman 45 0 0 11 Jul 2021
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis Hui Lu Zhiyong Wu Xixin Wu Xu Li Shiyin Kang Xunying Liu Helen Meng 69 12 0 07 Jul 2021
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information Qinghua Wu Quanbo Shen Jian Luan YuJun Wang 72 4 0 07 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion Daxin Tan Liqun Deng Y. Yeung Xin Jiang Xiao Chen Tan Lee 96 41 0 04 Jul 2021
Supervised Contrastive Learning for Accented Speech Recognition Tao Han Hantao Huang Ziang Yang Wei Han 66 16 0 02 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021 Dan Liu Mengge Du Xiaoxi Li Yuchen Hu Lirong Dai 99 21 0 01 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures Prateek Verma C. Chafe 79 29 0 30 Jun 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 67 0 0 29 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021