ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.05879
  4. Cited By
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
v1v2 (latest)

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

14 May 2019
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
ArXiv (abs)PDFHTML

Papers citing "AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss"

50 / 273 papers shown
Title
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled
  Representation and Prior Mixup for Verified Robust Voice Conversion
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
80
35
0
25 May 2023
Iteratively Improving Speech Recognition and Voice Conversion
Iteratively Improving Speech Recognition and Voice Conversion
Mayank Singh
Naoya Takahashi
Ono Naoyuki
51
4
0
24 May 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge
  Distillation and Hybrid Predictive Coding
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Ziqian Ning
Yuepeng Jiang
Pengcheng Zhu
Jixun Yao
Shuai Wang
Linfu Xie
Mengxiao Bi
76
10
0
21 May 2023
Data Augmentation for Diverse Voice Conversion in Noisy Environments
Data Augmentation for Diverse Voice Conversion in Noisy Environments
Avani Tanna
Michael Stephen Saxon
A. El Abbadi
William Yang Wang
17
0
0
18 May 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for
  Self-supervised Representation Based Voice Conversion
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Xintao Zhao
Shuai Wang
Yang Chao
Zhiyong Wu
Helen Meng
71
3
0
16 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level
  Speech Representations
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Wei-wei Lin
Chenhang He
Man-Wai Mak
Youzhi Tu
58
5
0
14 May 2023
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice
  Conversion
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
Zhichao Wang
Liumeng Xue
Qiuqiang Kong
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
BDL
100
3
0
12 May 2023
VSMask: Defending Against Voice Synthesis Attack via Real-Time
  Predictive Perturbation
VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation
Yuanda Wang
Hanqing Guo
Guangjing Wang
Bocheng Chen
Qiben Yan
AAML
60
18
0
09 May 2023
Who is Speaking Actually? Robust and Versatile Speaker Traceability for
  Voice Conversion
Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Yanzhen Ren
Hongcheng Zhu
Liming Zhai
Zongkun Sun
Rubing Shen
Lina Wang
71
9
0
09 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
85
4
0
08 May 2023
Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
Hideaki Takahashi
Jingjing Liu
Yang Liu
FedML
104
11
0
22 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
52
9
0
24 Mar 2023
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice
  Conversion
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
Hyun Joon Park
Seok Woo Yang
Jin Sob Kim
Wooseok Shin
S. W. Han
68
20
0
16 Mar 2023
Improving Few-Shot Learning for Talking Face System with TTS Data
  Augmentation
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
69
5
0
09 Mar 2023
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for
  Whisper-based Speech Interactions
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Jun Rekimoto
96
20
0
03 Mar 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice
  Conversion
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
Brendan O'Connor
S. Dixon
43
0
0
27 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier
  Transform for Faster Conversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Houjian Guo
Chaoran Liu
C. Ishi
H. Ishiguro
BDL
100
13
0
16 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly
  Disentangled Self-supervised Speech Representations
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
73
25
0
16 Feb 2023
Autodecompose: A generative self-supervised model for semantic
  decomposition
Autodecompose: A generative self-supervised model for semantic decomposition
M. Bonyadi
SSL
41
0
0
06 Feb 2023
SPADE: Self-supervised Pretraining for Acoustic DisEntanglement
SPADE: Self-supervised Pretraining for Acoustic DisEntanglement
John Harvill
Jarred Barber
Arun Nair
Ramin Pishehvar
SSLDRL
47
0
0
03 Feb 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
80
19
0
29 Dec 2022
Speaking Style Conversion in the Waveform Domain Using Discrete
  Self-Supervised Units
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
104
14
0
19 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
85
10
0
14 Dec 2022
VarietySound: Timbre-Controllable Video to Sound Generation via
  Unsupervised Information Disentanglement
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
86
14
0
19 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with
  Multi-factor Constraints
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
79
5
0
16 Nov 2022
Improved disentangled speech representations using contrastive learning
  in factorized hierarchical variational autoencoder
Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Yuying Xie
Thomas Arildsen
Zheng-Hua Tan
47
2
0
15 Nov 2022
A unified one-shot prosody and speaker conversion system with
  self-supervised discrete speech units
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
77
7
0
12 Nov 2022
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion
  of Bottleneck and Perturbation Features
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features
Ziqian Ning
Qicong Xie
Pengcheng Zhu
Zhichao Wang
Liumeng Xue
Jixun Yao
Linfu Xie
Mengxiao Bi
73
18
0
09 Nov 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
134
113
0
27 Oct 2022
Streaming Voice Conversion Via Intermediate Bottleneck Features And
  Non-streaming Teacher Guidance
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Yuan-Jui Chen
Ming Tu
Tang-Chun Li
Xin Li
Qiuqiang Kong
Jiaxin Li
Zhichao Wang
Qiao Tian
Yuping Wang
Yuxuan Wang
80
11
0
27 Oct 2022
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
41
1
0
25 Oct 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual
  Voice Conversion Using $β$-VAE
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using βββ-VAE
Hui Lu
Disong Wang
Xixin Wu
Zhiyong Wu
Xunying Liu
Helen M. Meng
DRL
115
10
0
25 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
Chihiro Watanabe
Hirokazu Kameoka
DRL
112
0
0
20 Oct 2022
Semi-Supervised Domain Adaptation with Auto-Encoder via Simultaneous
  Learning
Semi-Supervised Domain Adaptation with Auto-Encoder via Simultaneous Learning
Md. Mahmudur Rahman
Yikang Shen
Mohammad Arif Ul Alam
37
5
0
18 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
72
16
0
14 Oct 2022
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on
  Pitch and Speed
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
Mei-Shuo Chen
Z. Duan
105
11
0
23 Sep 2022
Non-Parallel Voice Conversion for ASR Augmentation
Non-Parallel Voice Conversion for ASR Augmentation
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
104
2
0
15 Sep 2022
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Ruibin Yuan
Yuxuan Wu
Jacob Li
Jaxter Kim
112
5
0
09 Sep 2022
Speech Representation Disentanglement with Adversarial Mutual
  Information Learning for One-shot Voice Conversion
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion
Sicheng Yang
Methawee Tantrawenith
Hao-Wen Zhuang
Zhiyong Wu
Aolan Sun
...
Ning Cheng
Huaizhen Tang
Xintao Zhao
Jie Wang
Helen Meng
DRL
56
39
0
18 Aug 2022
TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and
  Adversarial Training
TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Zhen Zeng
Edward Xiao
Jing Xiao
92
20
0
08 Aug 2022
Subband-based Generative Adversarial Network for Non-parallel
  Many-to-many Voice Conversion
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
61
0
0
13 Jul 2022
GlowVC: Mel-spectrogram space disentangling model for
  language-independent text-free voice conversion
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
Magdalena Proszewska
Grzegorz Beringer
Daniel Sáez-Trigueros
Thomas Merritt
Abdelhamid Ezzerg
Roberto Barra-Chicote
70
6
0
04 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice
  Conversion for Noisy Target Speakers
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
Liumeng Xue
Shan Yang
Na Hu
Jane Polak Scowcroft
Linfu Xie
51
2
0
02 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for
  Speech Synthesis based on Disentanglement between Prosody and Timbre
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
Weinan Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
108
27
0
29 Jun 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech
  Insertion
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Dacheng Yin
Chuanxin Tang
Yanqing Liu
Xiaoqiang Wang
Zhiyuan Zhao
Yucheng Zhao
Zhiwei Xiong
Sheng Zhao
Chong Luo
83
12
0
28 Jun 2022
Data Augmentation for Dementia Detection in Spoken Language
Data Augmentation for Dementia Detection in Spoken Language
Anna Hlédiková
Dominika Woszczyk
Alican Acman
Soteris Demetriou
Björn Schuller
70
13
0
26 Jun 2022
Identifying Source Speakers for Voice Conversion based Spoofing Attacks
  on Speaker Verification Systems
Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems
Danwei Cai
Zexin Cai
Ming Li
93
10
0
18 Jun 2022
End-to-End Voice Conversion with Information Perturbation
End-to-End Voice Conversion with Information Perturbation
Qicong Xie
Shan Yang
Yinjiao Lei
Linfu Xie
Jane Polak Scowcroft
70
7
0
15 Jun 2022
Streaming non-autoregressive model for any-to-many voice conversion
Streaming non-autoregressive model for any-to-many voice conversion
Ziyi Chen
Haoran Miao
Pengyuan Zhang
76
9
0
15 Jun 2022
Speak Like a Dog: Human to Non-human creature Voice Conversion
Speak Like a Dog: Human to Non-human creature Voice Conversion
Kohei Suzuki
Shoki Sakamoto
T. Taniguchi
Hirokazu Kameoka
55
3
0
09 Jun 2022
Previous
123456
Next