ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.05879
  4. Cited By
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
v1v2 (latest)

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

14 May 2019
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
ArXiv (abs)PDFHTML

Papers citing "AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss"

50 / 273 papers shown
Title
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent
  Adversarial Networks
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
Russell Sammut Bonnici
C. Saitis
Martin Benning
GAN
93
15
0
05 Sep 2021
StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized
  by Automatic Speech Recognition
StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition
Shoki Sakamoto
Akira Taniguchi
T. Taniguchi
Hirokazu Kameoka
BDL
63
5
0
10 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For
  Expressive Speech Synthesis
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Xudong Dai
Cheng Gong
Longbiao Wang
Kaili Zhang
34
2
0
04 Aug 2021
Beyond Voice Identity Conversion: Manipulating Voice Attributes by
  Adversarial Learning of Structured Disentangled Representations
Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations
L. Benaroya
Nicolas Obin
Axel Roebel
42
5
0
26 Jul 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
97
101
0
21 Jul 2021
An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice
  Quality and Data Augmentation
An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation
Xiangheng He
Junjie Chen
Georgios Rizos
Björn W. Schuller
47
14
0
18 Jul 2021
Many-to-Many Voice Conversion based Feature Disentanglement using
  Variational Autoencoder
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder
Manh Luong
Viet-Anh Tran
DRL
46
16
0
11 Jul 2021
Preliminary study on using vector quantization latent spaces for TTS/VC
  systems with consistent performance
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance
Hieu-Thi Luong
Junichi Yamagishi
85
0
0
25 Jun 2021
Improving robustness of one-shot voice conversion with deep
  discriminative speaker encoder
Improving robustness of one-shot voice conversion with deep discriminative speaker encoder
Hongqiang Du
Lei Xie
64
6
0
19 Jun 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
  Speech Representation Disentanglement for One-shot Voice Conversion
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
84
141
0
18 Jun 2021
Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant
  Environments
Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments
Alejandro Mottini
Jaime Lorenzo-Trueba
S. Karlapati
Thomas Drugman
29
8
0
16 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based
  Non-Parallel Voice Conversion
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion
Zhichao Wang
Xinyong Zhou
Fengyu Yang
Tao Li
Hongqiang Du
Lei Xie
Wendong Gan
Haitao Chen
Hai Li
65
22
0
16 Jun 2021
Global Rhythm Style Transfer Without Text Transcriptions
Global Rhythm Style Transfer Without Text Transcriptions
Kaizhi Qian
Yang Zhang
Shiyu Chang
Jinjun Xiong
Chuang Gan
David D. Cox
M. Hasegawa-Johnson
78
20
0
16 Jun 2021
MONCAE: Multi-Objective Neuroevolution of Convolutional Autoencoders
MONCAE: Multi-Objective Neuroevolution of Convolutional Autoencoders
Daniel Dimanov
E. Balaguer-Ballester
Colin Singleton
Shahin Rostami
42
7
0
07 Jun 2021
NVC-Net: End-to-End Adversarial Voice Conversion
NVC-Net: End-to-End Adversarial Voice Conversion
Bac Nguyen Cong
Fabien Cardinaux
AAML
126
42
0
02 Jun 2021
StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource
  Contexts
StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts
Matthew Baas
Herman Kamper
53
6
0
31 May 2021
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Yi-Chen Chen
Po-Han Chi
Shu-Wen Yang
Kai-Wei Chang
Jheng-hao Lin
Sung-Feng Huang
Da-Rong Liu
Chi-Liang Liu
Cheng-Kuang Lee
Hung-yi Lee
MoE
64
17
0
07 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLMALM
94
25
0
20 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice
  Conversion
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka
Kou Tanaka
Takuhiro Kaneko
81
21
0
14 Apr 2021
NoiseVC: Towards High Quality Zero-Shot Voice Conversion
NoiseVC: Towards High Quality Zero-Shot Voice Conversion
Shijun Wang
Damian Borth
DRL
75
6
0
13 Apr 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised
  Pretrained Representations
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
Jheng-hao Lin
Yist Y. Lin
C. Chien
Hung-yi Lee
147
56
0
07 Apr 2021
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech
  Synthesis Techniques
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-Wook Kim
Seung-won Park
Junhyeok Lee
Myun-chul Joe
76
28
0
02 Apr 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech
  Decomposition for Expressive and Controllable Neural Text to Speech
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
69
32
0
17 Mar 2021
Improving Zero-shot Voice Style Transfer via Disentangled Representation
  Learning
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning
Siyang Yuan
Pengyu Cheng
Ruiyi Zhang
Weituo Hao
Zhe Gan
Lawrence Carin
DRL
66
61
0
17 Mar 2021
Lightweight and interpretable neural modeling of an audio distortion
  effect using hyperconditioned differentiable biquads
Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads
S. Nercessian
Andy M. Sarroff
K. Werner
40
29
0
15 Mar 2021
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng
Xu Tan
Sheng Zhao
Frank Soong
Xiang-Yang Li
Tao Qin
88
96
0
27 Feb 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges,
  countermeasures, and way forward
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
202
323
0
25 Feb 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in
  Frames
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
73
60
0
25 Feb 2021
AudioVisual Speech Synthesis: A brief literature review
AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou
Athanasios Katsamanis
25
0
0
18 Feb 2021
Adversarially learning disentangled speech representations for robust
  multi-factor voice conversion
Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Jie Wang
Jingbei Li
Xintao Zhao
Zhiyong Wu
Shiyin Kang
Helen Meng
DRL
123
29
0
30 Jan 2021
EmoCat: Language-agnostic Emotional Voice Conversion
EmoCat: Language-agnostic Emotional Voice Conversion
Bastian Schnell
Goeric Huybrechts
Bartek Perz
Thomas Drugman
Jaime Lorenzo-Trueba
89
11
0
14 Jan 2021
AudioViewer: Learning to Visualize Sounds
AudioViewer: Learning to Visualize Sounds
Chunjin Song
Yuchi Zhang
Willis Peng
Parmis Mohaghegh
Bastian Wandt
Helge Rhodin
105
1
0
22 Dec 2020
How Far Are We from Robust Voice Conversion: A Survey
How Far Are We from Robust Voice Conversion: A Survey
Tzu-hsien Huang
Jheng-hao Lin
Chien-yu Huang
Hung-yi Lee
96
25
0
24 Nov 2020
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Zhichao Wang
Wenshuo Ge
Xiong Wang
Shan Yang
Wendong Gan
Haitao Chen
Hai Li
Lei Xie
Xiulin Li
CVBM
98
33
0
17 Nov 2020
Optimizing voice conversion network with cycle consistency loss of
  speaker identity
Optimizing voice conversion network with cycle consistency loss of speaker identity
Hongqiang Du
Xiaohai Tian
Lei Xie
Haizhou Li
57
18
0
17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
116
21
0
08 Nov 2020
Semi-supervised Learning for Singing Synthesis Timbre
Semi-supervised Learning for Singing Synthesis Timbre
J. Bonada
Merlijn Blaauw
51
4
0
05 Nov 2020
CVC: Contrastive Learning for Non-parallel Voice Conversion
CVC: Contrastive Learning for Non-parallel Voice Conversion
Tingle Li
Yichen Liu
Chenxu Hu
Hang Zhao
DRL
100
13
0
02 Nov 2020
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and
  Adaptive Instance Normalization
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization
Yen-Hao Chen
Da-Yi Wu
Tsung-Han Wu
Hung-yi Lee
111
108
0
31 Oct 2020
PPG-based singing voice conversion with adversarial representation
  learning
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
59
37
0
28 Oct 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and
  Fusing Fine-Grained Voice Fragments With Attention
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention
Yist Y. Lin
C. Chien
Jheng-hao Lin
Hung-yi Lee
Lin-Shan Lee
60
79
0
27 Oct 2020
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram
  Conversion
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
87
82
0
22 Oct 2020
The NeteaseGames System for Voice Conversion Challenge 2020 with
  Vector-quantization Variational Autoencoder and WaveNet
The NeteaseGames System for Voice Conversion Challenge 2020 with Vector-quantization Variational Autoencoder and WaveNet
Haitong Zhang
DRL
28
4
0
15 Oct 2020
FastVC: Fast Voice Conversion with non-parallel data
FastVC: Fast Voice Conversion with non-parallel data
Oriol Barbany
Milos Cernak
43
7
0
08 Oct 2020
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed
  Langevin Dynamics
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics
Hirokazu Kameoka
Takuhiro Kaneko
Kou Tanaka
Nobukatsu Hojo
Shogo Seki
DiffM
124
21
0
06 Oct 2020
The Academia Sinica Systems of Voice Conversion for VCC2020
The Academia Sinica Systems of Voice Conversion for VCC2020
Yu-Huai Peng
Cheng-Hung Hu
A. Kang
Hung-Shin Lee
Pin-Yuan Chen
Yu Tsao
Hsin-Min Wang
64
2
0
06 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge
  2020: Cascading ASR and TTS
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Wen-Chin Huang
Tomoki Hayashi
Shinji Watanabe
Tomoki Toda
DRL
81
40
0
06 Oct 2020
Transfer Learning from Speech Synthesis to Voice Conversion with
  Non-Parallel Training Data
Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data
Mingyang Zhang
Yi Zhou
Li Zhao
Haizhou Li
92
53
0
30 Sep 2020
A Deep Learning Based Analysis-Synthesis Framework For Unison Singing
A Deep Learning Based Analysis-Synthesis Framework For Unison Singing
Pritish Chandna
Helena Cuesta
Emilia Gómez
53
4
0
21 Sep 2020
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
  Modeling
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Songxiang Liu
Yuewen Cao
Disong Wang
Xixin Wu
Xunying Liu
Helen Meng
BDL
116
92
0
06 Sep 2020
Previous
123456
Next