Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06103
Cited By
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"
50 / 491 papers shown
Title
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
23
4
0
14 Aug 2023
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation
Zhichao Wang
M. Dai
Keld Lundgaard
VGen
DiffM
43
2
0
12 Aug 2023
Towards an AI to Win Ghana's National Science and Maths Quiz
George Boateng
Jonathan Abrefah Mensah
Kevin Takyi Yeboah
William Edor
Andrew Kojo Mensah-Onumah
Naafi Dasana Ibrahim
Nana Sam Yeboah
11
3
0
08 Aug 2023
A Systematic Exploration of Joint-training for Singing Voice Synthesis
Yuning Wu
Yifeng Yu
Jiatong Shi
Tao Qian
Qin Jin
38
5
0
05 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
35
10
0
03 Aug 2023
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Ramanan Sivaguru
Vasista Sai Lodagala
S. Umesh
14
2
0
02 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
15
14
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
29
35
0
31 Jul 2023
Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation
Yuan-Ping Chen
16
1
0
30 Jul 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
23
9
0
30 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
33
44
0
14 Jul 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Kun Song
Yinjiao Lei
Pei-Ning Chen
Yiqing Cao
Kun Wei
Yongmao Zhang
Linfu Xie
Ning Jiang
Guoqing Zhao
27
1
0
10 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature Review
J. Barnett
16
25
0
07 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
17
5
0
03 Jul 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
Daria Diatlova
V. Shutov
26
7
0
28 Jun 2023
Two-Stage Voice Anonymization for Enhanced Privacy
F. Nespoli
Daniel Barreda
Joerg Bitzer
Patrick A. Naylor
19
3
0
28 Jun 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
16
46
0
26 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
24
6
0
25 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
28
264
0
23 Jun 2023
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Ammar Abbas
S. Karlapati
Bastian Schnell
Penny Karanasou
M. G. Moya
Amith Nagaraj
Ayman Boustati
Nicole Peinelt
Alexis Moinet
Thomas Drugman
25
3
0
20 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
25
2
0
14 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSL
ELM
27
11
0
14 Jun 2023
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects
Xinghua Qu
Hongyang Liu
Zhu Sun
Xiang Yin
Yew-Soon Ong
Lu Lu
Zejun Ma
29
3
0
14 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
37
107
0
13 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
16
4
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
30
8
0
12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
33
282
0
11 Jun 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
Danni Liu
Thai-Binh Nguyen
Sai Koneru
Enes Yavuz Ugan
Ngoc-Quan Pham
Tuan-Nam Nguyen
Tu Anh Dinh
Carlos Mullov
A. Waibel
J. Niehues
18
6
0
08 Jun 2023
VIFS: An End-to-End Variational Inference for Foley Sound Synthesis
Junhyeok Lee
Hyeonuk Nam
Yong-Hwa Park
11
4
0
08 Jun 2023
FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator
Massa Baali
Ahmed M. Ali
14
1
0
07 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
27
6
0
07 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
29
9
0
07 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
32
73
0
06 Jun 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
40
4
0
06 Jun 2023
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong
Zhiying Huang
Qiao Tian
Chen Xu
Tom Ko
...
Lu Lu
Zejun Ma
Yuping Wang
Mingxuan Wang
Yuxuan Wang
20
23
0
05 Jun 2023
Coupled Variational Autoencoder
Xiaoran Hao
Patrick Shafto
BDL
DRL
19
4
0
05 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
14
2
0
02 Jun 2023
Text-to-Speech Pipeline for Swiss German -- A comparison
Tobias Bollinger
Jan Deriu
Manfred Vogel
DiffM
16
0
0
31 May 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
L. T. Nguyen
Thinh-Le-Gia Pham
Dat Quoc Nguyen
24
13
0
31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
25
0
0
31 May 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Guanghou Liu
Yongmao Zhang
Yinjiao Lei
Yunlin Chen
Rui Wang
Zhifei Li
Linfu Xie
16
37
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
29
26
0
30 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
27
4
0
28 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
32
1
0
26 May 2023
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
13
26
0
25 May 2023
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
23
4
0
23 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
S. Hwang
Eunho Yang
DiffM
23
16
0
23 May 2023
ADD 2023: the Second Audio Deepfake Detection Challenge
Jiangyan Yi
Jianhua Tao
Ruibo Fu
Xinrui Yan
Chenglong Wang
...
Zhengqi Wen
Shan Liang
Zheng Lian
Shuai Nie
Haizhou Li
84
94
0
23 May 2023
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
77
298
0
22 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
31
20
0
22 May 2023
Previous
1
2
3
...
10
6
7
8
9
Next