Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown

Title
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Qingkai Fang Shoutao Guo Yan Zhou Zhengrui Ma Shaolei Zhang Yang Feng AuLLM 25 30 0 10 Sep 2024
SongCreator: Lyrics-based Universal Song Generation Shun Lei Yixuan Zhou Boshi Tang Max W. Y. Lam Feng Liu Hangyu Liu Jingcheng Wu Shiyin Kang Zhiyong Wu Helen Meng 44 4 0 09 Sep 2024
USTC-KXDIGIT System Description for ASVspoof5 Challenge Y. Chen Haochen Wu Nan Jiang Xiang Xia Qing Gu ... Sian Fang Yan Song Wu Guo Lin Liu Minqiang Xu 36 1 0 03 Sep 2024
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka Li-Wei Chen Hung-Shin Lee Chen-Chi Chang VLM 27 0 0 03 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis Justin Lovelace Soham Ray Kwangyoun Kim Kilian Q. Weinberger Felix Wu 34 2 0 01 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Yuancheng Wang Haoyue Zhan Liwei Liu Ruihong Zeng Haotian Guo Jiachen Zheng Qiang Zhang Shunsi Zhang Shunsi Zhang Zhizheng Wu 34 38 0 01 Sep 2024
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion Yan Rong Li Liu 19 3 0 01 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation Yusheng Tian Junbin Liu Tan Lee DiffM 36 2 0 30 Aug 2024
WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding Mohan Li Cong-Thanh Do Simon Keizer Youmna Farag Svetlana Stoyanchev R. Doddipatla 35 2 0 29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling Jiachen Lian Xuanru Zhou Z. Ezzes Jet M J Vonk Brittany Morin D. Baquirin Zachary Mille M. G. Tempini Gopala Anumanchipalli AuLLM 32 1 0 29 Aug 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection Octavian Pascu Dan Oneaţă H. Cucu Nicolas M. Muller 40 1 0 28 Aug 2024
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection Xuanru Zhou Anshul Kashyap Steve Li Ayati Sharma Brittany Morin ... Z. Ezzes Zachary Miller M. G. Tempini Jiachen Lian Gopala Krishna Anumanchipalli 24 6 0 27 Aug 2024
Multi-faceted Sensory Substitution for Curb Alerting: A Pilot Investigation in Persons with Blindness and Low Vision Ligao Ruan Giles Hamilton-Fletcher Mahya Beheshti Todd E. Hudson Maurizio Porfiri JR Rizzo 38 0 0 26 Aug 2024
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard Wonjune Kang Margaret Hughes Deb Roy 31 1 0 26 Aug 2024
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation Shihao Chen Yu Gu Jianwei Cui Jie Zhang Rilin Chen Lirong Dai 34 2 0 22 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge Johan Rohdin Lin Zhang Oldřich Plchot Vojtěch Staněk David Mihola ... Themos Stafylakis Dmitriy Beveraki Anna Silnova Jan Brukner Lukáš Burget 36 1 0 20 Aug 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility Waris Quamer Ricardo Gutierrez-Osuna 32 1 0 20 Aug 2024
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech Xin Qi Ruibo Fu Zhengqi Wen Jianhua Tao Shuchen Shi ... Yuankun Xie Yukun Liu Guanjun Li Xuefei Liu Yongwei Li 27 1 0 20 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation Jaejun Lee Yoori Oh Injune Hwang Kyogu Lee CVBM 23 1 0 19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale Xin Wang Héctor Delgado Hemlata Tak Jee-weon Jung Hye-jin Shim ... Md. Sahidullah Tomi Kinnunen Nicholas W. D. Evans K. Lee Junichi Yamagishi AAML 45 38 0 16 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Sang-Hoon Lee Ha-Yeong Choi Seong-Whan Lee OOD DiffM AI4TS 43 5 0 14 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild Jiangyan Yi Chu Yuan Zhang Jianhua Tao Chenglong Wang Xinrui Yan Yong Ren Hao Gu Junzuo Zhou 50 1 0 09 Aug 2024
MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving B. Owoyele Martin Schilling Rohan Sawahn Niklas Kaemer Pavel Zherebenkov Bhuvanesh Verma Wim Pouw Gerard de Melo 27 0 0 06 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training Hawraz A. Ahmad Tarik A. Rashid 33 0 0 06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG Thibault Gaudier Marie Tahon Anthony Larcher Yannick Esteve 40 0 0 05 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models? Bozhou Li Hao Liang Zimo Meng Wentao Zhang VLM 38 3 0 01 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation Kohei Matsuura Takanori Ashihara Takafumi Moriya Masato Mimura Takatomo Kano A. Ogawa Marc Delcroix 19 2 0 01 Aug 2024
Generative Expressive Conversational Speech Synthesis Rui Liu Yifan Hu Yi Ren Xiang Yin Haizhou Li 56 5 0 31 Jul 2024
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization Junyan Wu Wei Lu Xiangyang Luo Rui Yang Qian Wang Xiaochun Cao 34 3 0 23 Jul 2024
dMel: Speech Tokenization made Simple Richard He Bai Tatiana Likhomanenko Ruixiang Zhang Zijin Gu Zakaria Aldeneh Navdeep Jaitly 35 4 0 22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Shuai Wang Zheng-Shou Chen Kong Aik Lee Yan-min Qian Haizhou Li 34 4 0 21 Jul 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies Srija Anand Praveena Varadhan Ashwin Sankar Giri Raju Mitesh M. Khapra 40 1 0 18 Jul 2024
Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems Daniel Platnick Bishoy Abdelnour Eamon Earl Rahul Kumar Zahra Rezaei Thomas Tsangaris Faraj Lagum 26 0 0 18 Jul 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation Ruibo Fu Xin Qi Zhengqi Wen Jianhua Tao Tao Wang ... Xiaopeng Wang Shuchen Shi Yukun Liu Xuefei Liu Shuai Zhang 49 0 0 07 Jul 2024
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis Cong-Thanh Do Shuhei Imai R. Doddipatla Thomas Hain 20 2 0 04 Jul 2024
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features Tomoki Koriyama 39 0 0 03 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis Yinlin Guo Yening Lv Jinqiao Dou Yan Zhang Yuehai Wang 18 0 0 30 Jun 2024
When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration Philipp Allgeuer Hassan Ali Stefan Wermter LM&Ro 31 9 0 29 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech Wenbin Wang Yang Song Sanjay Jha 36 5 0 21 Jun 2024
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems Kentaro Mitsui Koh Mitsuda Toshiaki Wakatsuki Yukiya Hono Kei Sawada 36 4 0 18 Jun 2024
1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis Sewade Ogun A. Owodunni Tobi Olatunji Eniola Alese Babatunde Oladimeji Tejumade Afonja Kayode Olaleye Naome A. Etori Tosin P. Adewumi 36 4 0 17 Jun 2024
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis Taewoo Kim Choongsang Cho Young Han Lee AI4TS 33 0 0 14 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis Dehua Tao Daxin Tan Y. Yeung Xiao Chen Tan Lee 35 3 0 13 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios Cheng Gong Erica Cooper Xin Wang Chunyu Qiang Mengzhe Geng ... Jianwu Dang Marc Tessier Aidan Pine Korin Richmond Junichi Yamagishi 35 2 0 13 Jun 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems Zhengyang Chen Xuechen Liu Erica Cooper Junichi Yamagishi Yanmin Qian 43 2 0 13 Jun 2024
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals Jihwan Lee Aditya Kommineni Tiantian Feng Kleanthis Avramidis Xuan Shi Sudarsana Reddy Kadiri Shrikanth Narayanan 31 0 0 12 Jun 2024
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding Rui Wang Liping Chen Kong AiK Lee Zhen-Hua Ling 23 2 0 12 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech Ashishkumar Gudmalwar Nirmesh Shah Sai Akarsh Pankaj Wasnik R. Shah 32 1 0 12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models Runyan Yang Huibao Yang Xiqing Zhang Tiantian Ye Ying Liu Yingying Gao Shilei Zhang Chao Deng Junlan Feng 34 0 0 12 Jun 2024
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance Semin Kim Myeonghun Jeong Hyeonseung Lee Minchan Kim Byoung Jin Choi Nam Soo Kim VLM DiffM 45 1 0 10 Jun 2024