ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06103
  4. Cited By
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
    DRL
ArXivPDFHTML

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown
Title
EE-TTS: Emphatic Expressive TTS with Linguistic Information
EE-TTS: Emphatic Expressive TTS with Linguistic Information
Yifan Zhong
Chen Zhang
Xule Liu
Chenxi Sun
Weishan Deng
Haifeng Hu
Zhongqian Sun
13
3
0
20 May 2023
An Android Robot Head as Embodied Conversational Agent
An Android Robot Head as Embodied Conversational Agent
Marcel Heisler
C. Becker-Asano
LM&Ro
LLMAG
29
0
0
18 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
19
1
0
18 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive
  Language-Audio Pre-training
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
26
20
0
18 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
DiffM
19
19
0
18 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
14
0
0
12 May 2023
Improving Cascaded Unsupervised Speech Translation with Denoising
  Back-translation
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation
Yu-Kuan Fu
Liang-Hsuan Tseng
Jiatong Shi
Chen An Li
Tsung-Yuan Hsu
Shinji Watanabe
Hung-yi Lee
17
4
0
12 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency
  Model
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
30
40
0
11 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with
  Bidirectional Attention Mechanism for Automatic Dubbing
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
H. Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
32
3
0
09 May 2023
DiffVoice: Text-to-Speech with Latent Diffusion
DiffVoice: Text-to-Speech with Latent Diffusion
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
22
22
0
23 Apr 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
  Speech Model
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
18
1
0
23 Apr 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot
  Speech and Singing Synthesizers
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
15
221
0
18 Apr 2023
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Jiatong Shi
Yun Tang
Ann Lee
H. Inaguma
Changhan Wang
J. Pino
Shinji Watanabe
38
9
0
10 Apr 2023
DSVAE: Interpretable Disentangled Representation for Synthetic Speech
  Detection
DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection
Amit Kumar Singh Yadav
Kratika Bhagtani
Ziyue Xiang
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
DRL
28
6
0
06 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
21
9
0
24 Mar 2023
FaceChat: An Emotion-Aware Face-to-face Dialogue Framework
FaceChat: An Emotion-Aware Face-to-face Dialogue Framework
Deema Alnuhait
Qingyang Wu
Zhou Yu
14
7
0
08 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
26
7
0
06 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model
An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen
Philip N. Garner
DiffM
31
1
0
03 Mar 2023
Leveraging Large Text Corpora for End-to-End Speech Summarization
Leveraging Large Text Corpora for End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
A. Ogawa
Marc Delcroix
Ryo Masumura
27
14
0
02 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
17
8
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
22
8
0
28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
11
6
0
28 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using
  Variance Information via Normalizing Flow
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
17
6
0
27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Mingxuan Wang
DiffM
30
44
0
20 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier
  Transform for Faster Conversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Houjian Guo
Chaoran Liu
C. Ishi
H. Ishiguro
BDL
17
12
0
16 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World
  Spontaneous Speech
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
8
35
0
08 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
DiffM
VLM
31
85
0
31 Jan 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
17
0
30 Jan 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
16
6
0
24 Jan 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
17
22
0
20 Jan 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Kimberly T. Mai
Sergi D. Bray
Toby O. Davies
Lewis D. Griffin
37
40
0
19 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice
  Conversion
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Hao Liu
Tao Wang
Ruibo Fu
Jiangyan Yi
Zhengqi Wen
J. Tao
18
3
0
10 Jan 2023
Singing voice synthesis based on frame-level sequence-to-sequence models
  considering vocal timing deviation
Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation
Miku Nishihara
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
6
1
0
05 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
43
639
0
05 Jan 2023
Source Tracing: Detecting Voice Spoofing
Source Tracing: Detecting Voice Spoofing
Tinglong Zhu
Xingming Wang
Xiaoyi Qin
Ming Li
24
10
0
16 Dec 2022
Text-to-speech synthesis based on latent variable conversion using
  diffusion probabilistic model and variational autoencoder
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Yusuke Yasuda
T. Toda
DiffM
15
7
0
16 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural
  Text-to-Speech Synthesis
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
18
0
0
15 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice
  Synthesis
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Dan Su
DiffM
13
8
0
03 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based
  architecture for zero-shot multi-speaker text-to-speech
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
15
12
0
30 Nov 2022
Evaluating and reducing the distance between synthetic and real speech
  distributions
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
23
7
0
29 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural
  MIDI-to-Audio Synthesis Systems?
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Xuan Shi
Erica Cooper
Xin Wang
Junichi Yamagishi
Shrikanth Narayanan
25
1
0
25 Nov 2022
IMaSC -- ICFOSS Malayalam Speech Corpus
IMaSC -- ICFOSS Malayalam Speech Corpus
D. Gopinath
K. ThennalD
Vrinda V. Nair
S. SwarajK
G. Sachin
AuLLM
18
1
0
23 Nov 2022
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural
  Speech Synthesis System
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Takenori Yoshimura
Shinji Takaki
Kazuhiro Nakamura
Keiichiro Oura
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
13
7
0
21 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors
  Decoupling
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Xinfa Zhu
Yinjiao Lei
Kun Song
Yongmao Zhang
Tao Li
Linfu Xie
11
16
0
19 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
36
18
0
17 Nov 2022
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label
  Guidance
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
Yiwei Guo
Chenpeng Du
Xie Chen
K. Yu
DiffM
52
39
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
18
46
0
17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with
  Diffusion Models
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
25
48
0
17 Nov 2022
Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody
  Annotation
Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation
Xin Yuan
Robin Feng
Mingming Ye
14
3
0
17 Nov 2022
Previous
123...10789
Next