Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06103
Cited By
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"
50 / 491 papers shown
Title
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
Dake Guo
J.-H. Yao
Xinfa Zhu
Kangxiang Xia
Zhao Guo
Ziyu Zhang
Y. Wang
Jie Liu
Lei Xie
29
1
0
31 Oct 2024
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings
Kangxiang Xia
Dake Guo
J.-H. Yao
Liumeng Xue
Hanzhao Li
...
Lei Xie
Qingqing Zhang
L. Luo
M. Dong
Peng Sun
52
1
0
31 Oct 2024
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
Kehan Sui
Jinxu Xiang
Fang Jin
DiffM
22
0
0
29 Oct 2024
Mitigating Unauthorized Speech Synthesis for Voice Protection
Zhisheng Zhang
Qianyi Yang
Derui Wang
Pengyang Huang
Yuxin Cao
Kai Ye
Jie Hao
AAML
16
3
0
28 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
32
0
0
24 Oct 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand
Praveen Srinivasa Varadhan
Mehak Singal
Mitesh M. Khapra
23
0
0
23 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Md Mubtasim Ahasan
Md Fahim
Tasnim Mohiuddin
A K M Mahbubur Rahman
Aman Chadha
Tariq Iqbal
M. A. Amin
Md. Mofijul Islam
Amin Ahsan Ali
25
0
0
19 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
T. Nguyen
Seymanur Akti
Ngoc-Quan Pham
A. Waibel
21
0
0
19 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
26
0
0
17 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
16
1
0
17 Oct 2024
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
Yu Gu
Qiushi Zhu
Guangzhi Lei
Chao Weng
Dan Su
DiffM
37
0
0
17 Oct 2024
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
Jianwei Cui
Yu Gu
Chao Weng
Jie M. Zhang
Liping Chen
Lirong Dai
62
3
0
16 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
45
2
0
16 Oct 2024
IsoChronoMeter: A simple and effective isochronic translation evaluation metric
Nikolai Rozanov
Vikentiy Pankov
Dmitrii Mukhutdinov
Dima Vypirailenko
24
1
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
125
2
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
25
52
0
09 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
31
0
0
09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
21
0
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
29
2
0
08 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
18
0
0
07 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
Youngjae Kim
Yejin Jeon
Gary Geunbae Lee
29
1
0
27 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
77
21
0
26 Sep 2024
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas Ueda
Leonardo B. de M. M. Marques
Flávio O. Simões
Mário Uliani Neto
Fernando Runstein
Bianca Dal Bó
Paula D. P. Costa
21
0
0
25 Sep 2024
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
Zixin Guo
Jian Zhang
29
1
0
24 Sep 2024
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Sotheara Leang
Anderson Augusma
E. Castelli
Frédérique Letué
Sethserey Sam
Dominique Vaufreydaz
18
0
0
24 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
Nohil Park
Heeseung Kim
Che Hyun Lee
Jooyoung Choi
Jiheum Yeom
Sungroh Yoon
23
2
0
24 Sep 2024
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis
Zhiyong Chen
Xinnuo Li
Zhiqi Ai
Shugong Xu
DiffM
34
1
0
24 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
54
2
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
48
3
0
23 Sep 2024
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
Xuanru Zhou
Jiachen Lian
Cheol Jun Cho
Jingwen Liu
Zongli Ye
...
Jet M J Vonk
Z. Ezzes
Zachary Miller
M. G. Tempini
Gopala Anumanchipalli
14
3
0
20 Sep 2024
Preference Alignment Improves Language Model-Based TTS
Jinchuan Tian
Chunlei Zhang
Jiatong Shi
Hao Zhang
Jianwei Yu
Shinji Watanabe
Dong Yu
32
7
0
19 Sep 2024
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
Daewoong Kim
Hao-Wen Dong
Dasaem Jeong
18
0
0
19 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
34
4
0
18 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Xuefei Liu
Guanjun Li
DiffM
28
0
0
18 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
65
1
0
18 Sep 2024
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
F. Nespoli
Daniel Barreda
Patrick A. Naylor
28
1
0
17 Sep 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge
Shuiyun Liu
Yuxiang Kong
Pengcheng Guo
Weiji Zhuang
Peng Gao
Yujun Wang
Lei Xie
39
0
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou
Cheol Jun Cho
Ayati Sharma
Brittany Morin
D. Baquirin
...
Zachary Miller
B. Tee
M. G. Tempini
Jiachen Lian
Gopala Anumanchipalli
34
3
0
15 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
38
3
0
14 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wenyuan Xu
31
13
0
14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
Henry Li Xinyuan
Zexin Cai
Ashi Garg
Kevin Duh
Leibny Paola García-Perera
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
29
3
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
32
1
0
13 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
33
0
0
13 Sep 2024
Super Monotonic Alignment Search
Junhyeok Lee
Hyeongju Kim
29
0
0
12 Sep 2024
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
Mahta Fetrat Qharabagh
Zahra Dehghanian
Hamid R. Rabiee
16
1
0
11 Sep 2024
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
Tien-Hong Lo
Meng-Ting Tsai
Berlin Chen
30
0
0
11 Sep 2024
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
Xin Jing
Kun Zhou
Andreas Triantafyllopoulos
Björn W. Schuller
DiffM
27
3
0
10 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection
Ziwei Yan
Yanjie Zhao
Haoyu Wang
32
1
0
10 Sep 2024
Previous
1
2
3
4
5
...
8
9
10
Next