ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06103
  4. Cited By
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
    DRL
ArXivPDFHTML

Papers citing "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

50 / 491 papers shown
Title
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Yanis Labrak
Adel Moumen
Richard Dufour
Mickael Rouvier
ELM
LM&MA
MedIm
29
0
0
09 Jun 2024
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Bingsong Bai
Fengping Wang
Yingming Gao
Ya Li
46
0
0
09 Jun 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody
  Modeling
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
38
2
0
09 Jun 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
23
1
0
08 Jun 2024
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice
  Conversion with Singer Guidance
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
Shihao Chen
Yu Gu
Jie Zhang
Na Li
Rilin Chen
Liping Chen
Lirong Dai
DiffM
40
6
0
08 Jun 2024
RU-AI: A Large Multimodal Dataset for Machine Generated Content
  Detection
RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
Liting Huang
Zhihao Zhang
Yiran Zhang
Xiyue Zhou
Shoujin Wang
NoLa
43
2
0
07 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
38
66
0
07 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
37
6
0
06 Jun 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis
  with Context-Aware Contrastive Language-Audio Pretraining
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
RALM
VLM
34
4
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with
  Multi-Modal Context and Large Language Model
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
40
4
0
06 Jun 2024
Harder or Different? Understanding Generalization of Audio Deepfake
  Detection
Harder or Different? Understanding Generalization of Audio Deepfake Detection
Nicolas M. Muller
Nicholas W. D. Evans
Hemlata Tak
Philip Sperl
Konstantin Böttinger
27
3
0
05 Jun 2024
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled
  Singing Voice Deepfake Detection
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Yongyi Zang
Jiatong Shi
You Zhang
Ryuichi Yamamoto
Jionghao Han
...
Shengyuan Xu
Wenxiao Zhao
Jing Guo
T. Toda
Zhiyao Duan
26
10
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
52
8
0
03 Jun 2024
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Peng Wang
Songshuo Lu
Yaohua Tang
Sijie Yan
Yuanjun Xiong
Wei Xia
AuLLM
31
10
0
29 May 2024
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
Arnav Goel
Medha Hira
Anubha Gupta
30
1
0
23 May 2024
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer
  Learning
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning
Medha Hira
Arnav Goel
Anubha Gupta
20
1
0
23 May 2024
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Wen-Chin Huang
Yi-Chiao Wu
T. Toda
40
1
0
20 May 2024
Exploring speech style spaces with language models: Emotional TTS
  without emotion labels
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra
Zongyang Du
Berrak Sisman
38
2
0
18 May 2024
Building a Luganda Text-to-Speech Model From Crowdsourced Data
Building a Luganda Text-to-Speech Model From Crowdsourced Data
Sulaiman Kagumire
Andrew Katumba
J. Nakatumba‐Nabende
John Quinn
16
1
0
16 May 2024
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based
  Speech Language Model
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Siyang Wang
Éva Székely
47
4
0
16 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
15
0
08 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
35
18
0
30 Apr 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
32
1
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
44
4
0
30 Apr 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
36
10
0
28 Apr 2024
An Investigation of Time-Frequency Representation Discriminators for
  High-Fidelity Vocoder
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Haizhou Li
Zhizheng Wu
28
2
0
26 Apr 2024
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Xinlei Niu
Jing Zhang
Charles Patrick Martin
21
2
0
24 Apr 2024
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual
  Expressiveness Annotations
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Sen Liu
Yiwei Guo
Xie Chen
Kai Yu
24
1
0
23 Apr 2024
Voice Attribute Editing with Text Prompt
Voice Attribute Editing with Text Prompt
Zheng-Yan Sheng
Yang Ai
Li-Juan Liu
Jia Pan
Zhenhua Ling
26
6
0
13 Apr 2024
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
Kevin Zhang
Luka Chkhetiani
Francis McCann Ramirez
Yash Khare
Andrea Vanzo
...
Ruben Bousbib
Taufiquzzaman Peyash
Michael Nguyen
Dillon Pulliam
Domenic Donato
32
2
0
10 Apr 2024
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Xincan Feng
A. Yoshimoto
41
2
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
70
39
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
36
21
0
03 Apr 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through
  Weighted Samplers and Consistency Models
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
32
6
0
31 Mar 2024
Causal Inference for Human-Language Model Collaboration
Causal Inference for Human-Language Model Collaboration
Bohan Zhang
Yixin Wang
Paramveer S. Dhillon
38
2
0
30 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David F. Harwath
74
57
0
25 Mar 2024
KunquDB: An Attempt for Speaker Verification in the Chinese Opera
  Scenario
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Huali Zhou
Yuke Lin
Dongxi Liu
Ming Li
29
0
0
20 Mar 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight
  Text-to-Speech
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
35
0
0
13 Mar 2024
Automatic design optimization of preference-based subjective evaluation
  with online learning in crowdsourcing environment
Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Yusuke Yasuda
T. Toda
18
1
0
10 Mar 2024
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot
  Text-to-Speech with Model and Data Scaling
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
44
5
0
09 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker
  Replication
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
24
2
0
06 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
43
143
0
05 Mar 2024
Brilla AI: AI Contestant for the National Science and Maths Quiz
Brilla AI: AI Contestant for the National Science and Maths Quiz
George Boateng
Jonathan Abrefah Mensah
Kevin Takyi Yeboah
William Edor
Andrew Kojo Mensah-Onumah
Naafi Dasana Ibrahim
Nana Sam Yeboah
19
2
0
04 Mar 2024
PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice
  Conversion
PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion
Tianhua Qi
Wenming Zheng
Cheng Lu
Yuan Zong
Hailun Lian
19
2
0
03 Mar 2024
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech
  Synthesis
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Wei-wei Lin
Chenhang He
Man-Wai Mak
Jiachen Lian
Kong Aik Lee
DiffM
41
0
0
01 Mar 2024
G4G:A Generic Framework for High Fidelity Talking Face Generation with
  Fine-grained Intra-modal Alignment
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Juan Zhang
Jiahao Chen
Cheng Wang
Zhi-Yang Yu
Tangquan Qi
Di Wu
CVBM
38
0
0
28 Feb 2024
High-Fidelity Neural Phonetic Posteriorgrams
High-Fidelity Neural Phonetic Posteriorgrams
Cameron Churchwell
Max Morrison
Bryan Pardo
35
4
0
27 Feb 2024
Advancing Generative Model Evaluation: A Novel Algorithm for Realistic
  Image Synthesis and Comparison in OCR System
Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System
Majid Memari
Khaled R. Ahmed
Shahram Rahimi
Noorbakhsh Amiri Golilarz
EGVM
33
1
0
27 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
  Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
28
6
0
25 Feb 2024
Previous
12345...8910
Next