Title
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation Jiangzong Wang Pengcheng Li Xulong Zhang Ning Cheng Jing Xiao 81 0 0 14 Nov 2023
CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion Anders Vestergaard Norskov Alexander Neergaard Zahid Morten Morup 65 3 0 13 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores Daniel Y. Fu Hermann Kumbong Eric N. D. Nguyen Christopher Ré VLM 100 30 0 10 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them Muhammad Ali Farooq Dan Bigioi Rishabh Jain Wang Yao Mariam Yiwere Peter Corcoran 86 0 0 08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Rishabh Jain Peter Corcoran 53 0 0 07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction Minchan Kim Myeonghun Jeong Byoung Jin Choi Dongjune Lee N. Kim AI4TS 95 12 0 06 Nov 2023
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues? Lucas Druart Léo Jacqmin Benoit Favre L. Rojas-Barahona Valentin Vielzeuf 67 0 0 03 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN Neeraj Kumar Ankur Narang Brejesh Lall DiffM 69 0 0 27 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Marek Kubis Pawel Skórzewski Marcin Sowañski Tomasz Ziętkiewicz 56 6 0 25 Oct 2023
Generative Pre-training for Speech with Flow Matching Alexander H. Liu Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu 104 36 0 25 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 95 0 0 23 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens Feiyu Shen Yiwei Guo Chenpeng Du Xie Chen Kai Yu 95 13 0 23 Oct 2023
An overview of text-to-speech systems and media applications Mohammad Reza Hasanabadi 28 3 0 22 Oct 2023
Energy-Based Models For Speech Synthesis Wanli Sun Zehai Tu Anton Ragni DiffM 67 1 0 19 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition Nick Rossenbach Benedikt Hilmes Ralf Schluter 59 3 0 12 Oct 2023
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention R. Valentim Idilio Drago Marco Mellia Federico Cerutti 16 1 0 10 Oct 2023
Prosody Analysis of Audiobooks Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 89 1 0 10 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens Robin Algayres Yossi Adi Tu Nguyen Jade Copet Gabriel Synnaeve Benoît Sagot Emmanuel Dupoux AuLLM 119 16 0 08 Oct 2023
Unified speech and gesture synthesis using flow matching Shivam Mehta Ruibo Tu Simon Alexanderson Jonas Beskow Éva Székely G. Henter 100 3 0 08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset Ze Liu 53 1 0 08 Oct 2023
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation Aman Khullar Daniel K. Nkemelu Cuong V. Nguyen Michael L. Best 80 5 0 04 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation Roi Benita Michael Elad Joseph Keshet DiffM 115 8 0 02 Oct 2023
Towards human-like spoken dialogue generation between AI agents from written dialogue Kentaro Mitsui Yukiya Hono Kei Sawada 88 14 0 02 Oct 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS Xin Wang Taein Kwon Wei-Ning Hsu Yossi Adi Tu Nguyen D. Bohus Emmanuel Dupoux Neel Joshi Abdelrahman Mohamed 42 4 0 29 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey Yuchen Liu Apu Kapadia Donald Williamson AAML 76 0 0 26 Sep 2023
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models Alexandre R. Ferreira Cláudio E. C. Campelo 34 1 0 22 Sep 2023
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Yu Gu Yianrao Bian Guangzhi Lei Chao Weng Jane Polak Scowcroft DiffM 55 2 0 22 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers Xintong Wang Chang Zeng Jun Chen Chunhui Wang 71 6 0 22 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts Shunwei Lei Yixuan Zhou Liyang Chen Dan Luo Zhiyong Wu ... Shiyin Kang Tao Jiang Yahui Zhou Yuxing Han Helen M. Meng VLM 90 2 0 21 Sep 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis X. Wei Jia Jia Xiang Li Zhiyong Wu Ziyi Wang 69 1 0 21 Sep 2023
The Impact of Silence on Speech Anti-Spoofing Yuxiang Zhang Zhuo Li Jingze Lu Hua Hua Wenchao Wang Pengyuan Zhang 80 21 0 21 Sep 2023
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech Rui Liu Bin Liu Haizhou Li 53 3 0 21 Sep 2023
SpeechAlign: a Framework for Speech Translation Alignment Evaluation Belen Alastruey Aleix Sant Gerard I. Gállego David Dale Marta R. Costa-jussá AuLLM 56 3 0 20 Sep 2023
Speak While You Think: Streaming Speech Synthesis During Text Generation Avihu Dekel Slava Shechtman Raul Fernandez David Haws Zvi Kons R. Hoory 64 9 0 20 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Yinghao Aaron Li Cong Han Xilin Jiang N. Mesgarani 101 4 0 18 Sep 2023
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers Peter Ochieng DiffM 56 0 0 18 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech Dariusz Piotrowski Renard Korzeniowski Alessio Falai Sebastian Cygert Kamil Pokora Georgi Tinchev Ziyao Zhang K. Yanagisawa 72 1 0 15 Sep 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs Md Awsafur Rahman Bishmoy Paul Najibul Haque Sarker Zaber Ibn Abdul Hakim S. Fattah Mohammad Saquib 59 3 0 15 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions Reo Shimizu Ryuichi Yamamoto Masaya Kawamura Yuma Shirahata Hironori Doi Tatsuya Komatsu Kentaro Tachibana DiffM 95 25 0 15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features Kentaro Seki Shinnosuke Takamichi Takaaki Saeki Hiroshi Saruwatari 76 4 0 15 Sep 2023
Direct Text to Speech Translation System using Acoustic Units Victoria Mingote Pablo Gimeno Luis Vicente Sameer Khurana Antoine Laurent J. Duret 55 4 0 14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks Soumi Maiti Yifan Peng Shukjae Choi Jee-weon Jung Xuankai Chang Shinji Watanabe VLM AuLLM 125 69 0 14 Sep 2023
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems Hanqing Guo Xun Chen Junfeng Guo Li Xiao Qiben Yan 84 13 0 13 Sep 2023
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation Zhichao Wu Qiulin Li Sixing Liu Qun Yang 70 3 0 13 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms Chu Yuan Zhang Jiangyan Yi Jianhua Tao Chenglong Wang Xinrui Yan 87 8 0 13 Sep 2023
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram Zhifeng Kong Ming-Yu Liu Ambrish Dantrey Bryan Catanzaro 51 7 0 12 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation Yongqian Li Cheng Yu Guangzhi Sun Weiqin Zu Zheng Tian ... Wei Pan Chao Zhang Jun Wang Yang Yang Fanglei Sun 66 2 0 08 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Takashi Shibuya Yuhta Takida Yuki Mitsufuji 71 11 0 06 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023 Zhihang Xu Shaofei Zhang Xi Wang Jiajun Zhang Wenning Wei Lei He Sheng Zhao 81 2 0 06 Sep 2023
Voice Morphing: Two Identities in One Voice Sushant Pani Anurag Chowdhury Morgan Sandler Arun Ross 79 1 0 05 Sep 2023