Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06103
Cited By
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"
50 / 491 papers shown
Title
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
H. M. Zhang
...
Yichen Xiao
Yiying Zhou
Y. Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
34
0
0
12 May 2025
SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation
Yu-Ren Guo
Wen-Kai Tai
50
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a
50
K
B
u
d
g
e
t
50K Budget
50
K
B
u
d
g
e
t
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
123
0
0
27 Apr 2025
Likelihood-Free Variational Autoencoders
Chen Xu
Qiang Wang
Lijun Sun
DiffM
DRL
80
0
0
24 Apr 2025
Using Phonemes in cascaded S2S translation pipeline
Rene Pilz
Johannes Schneider
39
0
0
22 Apr 2025
Protecting Your Voice: Temporal-aware Robust Watermarking
Yue Li
Weizhi Liu
Dongdong Lin
32
0
0
21 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
62
0
0
17 Apr 2025
Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
Botao Zhao
Zuheng Kang
Yayun He
Xiaoyang Qu
Junqing Peng
Jing Xiao
Jianzong Wang
23
0
0
15 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
27
0
0
14 Apr 2025
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Haowei Lou
Hye-Young Paik
Sheng Li
Wen Hu
Lina Yao
34
0
0
11 Apr 2025
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
Yizhong Geng
Jizhuo Xu
Zeyu Liang
Jinghan Yang
Xiaoyi Shi
Xiaoyu Shen
19
0
0
10 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
30
0
0
08 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Z. Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
51
0
0
07 Apr 2025
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
Hedi Naouara
Jean-Pierre Lorré
Jérôme Louradour
49
0
0
03 Apr 2025
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems
Weifei Jin
Yuxin Cao
Junjie Su
Derui Wang
Yedi Zhang
Minhui Xue
Jie Hao
Jin Song Dong
Yixian Yang
AAML
57
0
0
01 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
H. Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
49
0
0
29 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Weihao Wu
Zhiwei Lin
Yixuan Zhou
Jingbei Li
Rui Niu
Qinghua Wu
Songjun Cao
Long Ma
Zhiyong Wu
DiffM
39
0
0
27 Feb 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
63
0
0
26 Feb 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Y. Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
64
0
0
26 Feb 2025
Everyday Speech in the Indian Subcontinent
Utkarsh Pathak
54
1
0
24 Feb 2025
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
Wei Zhao
Pengxiang Ding
M. Zhang
Zhefei Gong
Shuanghao Bai
H. Zhao
Donglin Wang
87
6
0
24 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
93
0
0
21 Feb 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Wenxiang Guo
Yu Zhang
Changhao Pan
Rongjie Huang
Li Tang
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Zhou Zhao
99
3
0
18 Feb 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
47
1
0
17 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
72
0
0
23 Jan 2025
MathReader : Text-to-Speech for Mathematical Documents
Sieun Hyeon
Kyudan Jung
N. Kim
Hyun Gon Ryu
Jaeyoung Do
36
1
0
13 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
39
0
0
10 Jan 2025
KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning
Fangchen Yu
Ruilizhen Hu
Yidong Lin
Yuqi Ma
Zhenghao Huang
Wenye Li
27
0
0
03 Jan 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
42
3
0
26 Dec 2024
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
Lei Yang
Shaoyang Xu
Deyi Xiong
28
0
0
25 Dec 2024
Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
N. Tomashenko
Emmanuel Vincent
Marc Tommasi
36
0
0
22 Dec 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
167
4
0
22 Dec 2024
RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning
Raphael Memmesheimer
Jan Nogga
Bastian Patzold
Evgenii Kruzhkov
S. Bultmann
...
Jonas Bode
Bertan Karacora
Juhui Park
A. Savinykh
Sven Behnke
72
2
0
19 Dec 2024
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Y. Xu
Yizhi Zhou
Haina Zhu
H. Li
KELM
168
1
0
18 Dec 2024
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
Xiangheng He
Junjie Chen
Zixing Zhang
Björn W. Schuller
78
0
0
16 Dec 2024
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
Zhoulin Ji
Chenhao Lin
Hang Wang
Chao Shen
102
0
0
12 Dec 2024
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Ze Yuan
Yanqing Liu
Shujie Liu
Sheng Zhao
AuLLM
74
1
0
06 Dec 2024
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
Joonyong Park
Daisuke Saito
N. Minematsu
67
0
0
04 Dec 2024
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
F. Khan
Mubarak Shah
89
2
0
29 Nov 2024
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
Dragos-Alexandru Boldisor
Stefan Smeu
Dan Oneaţă
Elisabeta Oneata
98
1
0
29 Nov 2024
Zero-shot Voice Conversion with Diffusion Transformers
Songting Liu
37
2
0
15 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
29
0
0
14 Nov 2024
Evaluating Synthetic Command Attacks on Smart Voice Assistants
Zhengxian He
Ashish Kundu
M. Ahamad
ELM
AAML
26
0
0
13 Nov 2024
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models
Dongrui Han
Mingyu Cui
Jiawen Kang
Xixin Wu
Xunying Liu
H. Meng
27
1
0
12 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Y. Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
35
10
0
02 Nov 2024
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Hui-Peng Du
Ye-Xin Lu
Zhen-Hua Ling
48
2
0
01 Nov 2024
1
2
3
4
...
8
9
10
Next