Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown

Title
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge Dake Guo J.-H. Yao Xinfa Zhu Kangxiang Xia Zhao Guo Ziyu Zhang Y. Wang Jie Liu Lei Xie 29 1 0 31 Oct 2024
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings Kangxiang Xia Dake Guo J.-H. Yao Liumeng Xue Hanzhao Li ... Lei Xie Qingqing Zhang L. Luo M. Dong Peng Sun 52 1 0 31 Oct 2024
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis Kehan Sui Jinxu Xiang Fang Jin DiffM 22 0 0 29 Oct 2024
Mitigating Unauthorized Speech Synthesis for Voice Protection Zhisheng Zhang Qianyi Yang Derui Wang Pengyang Huang Yuxin Cao Kai Ye Jie Hao AAML 16 3 0 28 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis Suparna De Ionut Bostan Nishanth Sastry 32 0 0 24 Oct 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams Srija Anand Praveen Srinivasa Varadhan Mehak Singal Mitesh M. Khapra 23 0 0 23 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization Md Mubtasim Ahasan Md Fahim Tasnim Mohiuddin A K M Mahbubur Rahman Aman Chadha Tariq Iqbal M. A. Amin Md. Mofijul Islam Amin Ahsan Ali 25 0 0 19 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS T. Nguyen Seymanur Akti Ngoc-Quan Pham A. Waibel 21 0 0 19 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding Tan Dat Nguyen Ji-Hoon Kim Jeongsoo Choi Shukjae Choi Jinseok Park Younglo Lee Joon Son Chung 26 0 0 17 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 16 1 0 17 Oct 2024
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis Yu Gu Qiushi Zhu Guangzhi Lei Chao Weng Dan Su DiffM 37 0 0 17 Oct 2024
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model Jianwei Cui Yu Gu Chao Weng Jie M. Zhang Liping Chen Lirong Dai 62 3 0 16 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone Xuyuan Li Zengqiang Shang Hua Hua Peiyang Shi Chen Yang Li Wang Pengyuan Zhang 45 2 0 16 Oct 2024
IsoChronoMeter: A simple and effective isochronic translation evaluation metric Nikolai Rozanov Vikentiy Pankov Dmitrii Mukhutdinov Dima Vypirailenko 24 1 0 14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan N. Tomashenko Xiaoxiao Miao Emmanuel Vincent Junichi Yamagishi 125 2 0 09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Yushen Chen Zhikang Niu Ziyang Ma Keqi Deng Chunhui Wang Jian Zhao Kai Yu Xie Chen 25 52 0 09 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset Anton Firc K. Malinka P. Hanáček DiffM 31 0 0 09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS Onkar Kishor Susladkar Vishesh Tripathi Biddwan Ahmed 21 0 0 09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality Nicholas Babaev Kirill Tamogashev Azat Saginbaev Ivan Shchekotov Hanbin Bae Hosang Sung WonJun Lee Hoon-Young Cho Pavel Andreev 29 2 0 08 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech Minchan Kim Myeonghun Jeong Joun Yeop Lee Nam Soo Kim 18 0 0 07 Oct 2024
Recent Advances in Speech Language Models: A Survey Wenqian Cui Dianzhi Yu Xiaoqi Jiao Ziqiao Meng Guangyan Zhang Qichao Wang Yiwen Guo Irwin King AuLLM 59 14 0 01 Oct 2024
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech Youngjae Kim Yejin Jeon Gary Geunbae Lee 29 1 0 27 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Kai Chen Yunhao Gou Runhui Huang Zhili Liu Daxin Tan ... Qun Liu Jun Yao Lu Hou Hang Xu Hang Xu AuLLM MLLM VLM 77 21 0 26 Sep 2024
Exploring synthetic data for cross-speaker style transfer in style representation based TTS Lucas Ueda Leonardo B. de M. M. Marques Flávio O. Simões Mário Uliani Neto Fernando Runstein Bianca Dal Bó Paula D. P. Costa 21 0 0 25 Sep 2024
FastTalker: Jointly Generating Speech and Conversational Gestures from Text Zixin Guo Jian Zhang 29 1 0 24 Sep 2024
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization Sotheara Leang Anderson Augusma E. Castelli Frédérique Letué Sethserey Sam Dominique Vaufreydaz 18 0 0 24 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers Nohil Park Heeseung Kim Che Hyun Lee Jooyoung Choi Jiheum Yeom Sungroh Yoon 23 2 0 24 Sep 2024
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Zhiyong Chen Xinnuo Li Zhiqi Ai Shugong Xu DiffM 34 1 0 24 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation Hieu-Thi Luong Haoyang Li Lin Zhang Kong Aik Lee Eng Siong Chng 54 2 0 23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection Lam Pham Phat Lam Dat Tran Hieu Tang Tin Nguyen Alexander Schindler Canh Vu Alexander Polonsky Canh Vu 48 3 0 23 Sep 2024
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection Xuanru Zhou Jiachen Lian Cheol Jun Cho Jingwen Liu Zongli Ye ... Jet M J Vonk Z. Ezzes Zachary Miller M. G. Tempini Gopala Anumanchipalli 14 3 0 20 Sep 2024
Preference Alignment Improves Language Model-Based TTS Jinchuan Tian Chunlei Zhang Jiatong Shi Hao Zhang Jianwei Yu Shinji Watanabe Dong Yu 32 7 0 19 Sep 2024
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning Daewoong Kim Hao-Wen Dong Dasaem Jeong 18 0 0 19 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Sijing Chen Yuan Feng Laipeng He Tianwei He Wendi He ... Huimin Zhang Xiang Zhang Guangcheng Zhao Hongbin Zhou Pengpeng Zou 34 4 0 18 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech Xin Qi Ruibo Fu Zhengqi Wen Tao Wang Chunyu Qiang ... Xiaopeng Wang Yuankun Xie Yukun Liu Xuefei Liu Guanjun Li DiffM 28 0 0 18 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild Jee-weon Jung Yihan Wu Xin Wang Ji-Hoon Kim Soumi Maiti ... Joon Son Chung Wangyou Zhang Seyun Um Shinnosuke Takamichi Shinji Watanabe 65 1 0 18 Sep 2024
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora F. Nespoli Daniel Barreda Patrick A. Naylor 28 1 0 17 Sep 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge Shuiyun Liu Yuxiang Kong Pengcheng Guo Weiji Zhuang Peng Gao Yujun Wang Lei Xie 39 0 0 16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Yinghao Aaron Li Xilin Jiang Cong Han N. Mesgarani DiffM 29 4 0 16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection Xuanru Zhou Cheol Jun Cho Ayati Sharma Brittany Morin D. Baquirin ... Zachary Miller B. Tee M. G. Tempini Jiachen Lian Gopala Anumanchipalli 34 3 0 15 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS Zhijun Liu Shuai Wang Pengcheng Zhu Mengxiao Bi Haizhou Li VLM DiffM 38 3 0 14 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection Xinfeng Li Kai Li Yifan Zheng Chen Yan Xiaoyu Ji Wenyuan Xu 31 13 0 14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024 Henry Li Xinyuan Zexin Cai Ashi Garg Kevin Duh Leibny Paola García-Perera Sanjeev Khudanpur Nicholas Andrews Matthew Wiesner 29 3 0 13 Sep 2024
Text-To-Speech Synthesis In The Wild Jee-weon Jung Wangyou Zhang Soumi Maiti Yihan Wu Xin Wang ... Hye-jin Shim Nicholas W. D. Evans Joon Son Chung Shinnosuke Takamichi Shinji Watanabe 32 1 0 13 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling Sotirios Karapiperis Nikolaos Ellinas Alexandra Vioni Junkwang Oh Gunu Jho Inchul Hwang S. Raptis 33 0 0 13 Sep 2024
Super Monotonic Alignment Search Junhyeok Lee Hyeongju Kim 29 0 0 12 Sep 2024
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages Mahta Fetrat Qharabagh Zahra Dehghanian Hamid R. Rabiee 16 1 0 11 Sep 2024
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment Tien-Hong Lo Meng-Ting Tsai Berlin Chen 30 0 0 11 Sep 2024
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models Xin Jing Kun Zhou Andreas Triantafyllopoulos Björn W. Schuller DiffM 27 3 0 10 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection Ziwei Yan Yanjie Zhao Haoyu Wang 32 1 0 10 Sep 2024