Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.05879
Cited By
v1
v2 (latest)
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
14 May 2019
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss"
50 / 273 papers shown
Title
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
Yisi Liu
Chenyang Wang
Hanjo Kim
Raniya Khan
Gopala Anumanchipalli
107
0
0
12 Jun 2025
Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
25
0
0
10 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
42
0
0
01 Jun 2025
RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations
Seungmin Kim
Sohee Park
Donghyun Kim
Jisu Lee
Daeseon Choi
AAML
59
0
0
19 May 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
79
0
0
27 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
214
2
0
27 Apr 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
149
0
0
11 Apr 2025
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems
Weifei Jin
Yuxin Cao
Junjie Su
Derui Wang
Yedi Zhang
Minhui Xue
Jie Hao
Jin Song Dong
Yixian Yang
AAML
83
0
0
01 Apr 2025
Can Diffusion Models Disentangle? A Theoretical Perspective
Liming Wang
Muhammad Jehanzeb Mirza
Yishu Gong
Yuan Gong
Jiaqi Zhang
Brian Tracey
Katerina Placek
Marco Vilela
James Glass
DiffM
CoGe
120
0
0
31 Mar 2025
Text-Driven Voice Conversion via Latent State-Space Modeling
Wen Li
Sofia Martinez
Priyanka Shah
78
0
0
26 Mar 2025
ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer
Michael Brown
Sofia Martinez
Priya Singh
72
0
0
26 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
108
0
0
11 Mar 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Jialong Zuo
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Xize Cheng
...
Wenrui Liu
Guangyan Zhang
Zehai Tu
Yiwen Guo
Zhou Zhao
98
2
0
08 Feb 2025
Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities
Rebecca Mobbs
Dimitrios Makris
Vasileios Argyriou
67
0
0
02 Feb 2025
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Xinfa Zhu
Lei He
Yujia Xiao
Xi Wang
Xu Tan
Sheng Zhao
Lei Xie
DiffM
102
2
0
08 Jan 2025
CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Yuke Li
Xinfa Zhu
Hanzhao Li
Jixun Yao
WenJie Tian
XiPeng Yang
Yunlin Chen
Zhifei Li
Lei Xie
DiffM
167
0
0
28 Nov 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
101
1
0
25 Nov 2024
Zero-shot Voice Conversion with Diffusion Transformers
Songting Liu
76
3
0
15 Nov 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
102
3
0
21 Oct 2024
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
Evgeny Burnaev
OT
57
2
0
17 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
257
3
0
09 Oct 2024
Disentangling Textual and Acoustic Features of Neural Speech Representations
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
Afra Alishahi
Ivan Titov
CoGe
76
2
0
03 Oct 2024
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
Zixin Guo
Jian Zhang
116
1
0
24 Sep 2024
On the Feasibility of Fully AI-automated Vishing Attacks
João Figueiredo
Afonso Carvalho
Daniel Castro
Daniel Gonçalves
Nuno Santos
93
5
0
20 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
90
0
0
17 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wei Dong
83
16
0
14 Sep 2024
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Kyungguen Byun
Jason Filos
Erik Visser
Sunkuk Moon
70
0
0
10 Sep 2024
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Zedong Xing
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
66
0
0
03 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
105
0
0
03 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
129
61
0
01 Sep 2024
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
A. R. Bargum
Simon Lajboschitz
Cumhur Erkut
73
1
0
29 Aug 2024
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
Haowei Lou
Helen Paik
Wen Hu
Lina Yao
VLM
94
0
0
27 Aug 2024
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard
Wonjune Kang
Margaret Hughes
Deb Roy
90
1
0
26 Aug 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
78
1
0
20 Aug 2024
DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Yin-Jyun Luo
K. Cheuk
Woosung Choi
Toshimitsu Uesaka
Keisuke Toyama
...
Chieh-Hsin Lai
Yuhta Takida
Wei-Hsiang Liao
Simon Dixon
Yuki Mitsufuji
CoGe
106
2
0
20 Aug 2024
Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics
Thomas Thebaud
Gaël Le Lan
Anthony Larcher
AAML
64
0
0
14 Aug 2024
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Zhichao Wang
Yuanzhe Chen
Xinsheng Wang
Lei Xie
Yuping Wang
117
1
0
05 Aug 2024
Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation
Jintao Tan
Xize Cheng
Lingyu Xiong
Lei Zhu
Xiandong Li
Wenxiong Kang
Kai Gong
Minglei Li
Yi Cai
DiffM
90
2
0
03 Aug 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
Jingshu Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Yanzhe Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
119
106
0
04 Jun 2024
Robustifying Safety-Aligned Large Language Models through Clean Data Curation
Xiaoqun Liu
Jiacheng Liang
Muchao Ye
Zhaohan Xi
AAML
123
23
0
24 May 2024
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
Weifei Jin
Yuxin Cao
Junjie Su
Qi Shen
Kai Ye
Derui Wang
Jie Hao
Ziyao Liu
AAML
132
2
0
15 May 2024
SingIt! Singer Voice Transformation
Amit Eliav
Aaron Taub
Renana Opochinsky
Sharon Gannot
75
0
0
07 May 2024
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Zongyang Du
Junchen Lu
Kun Zhou
Lakshmish Kaushik
Berrak Sisman
104
1
0
02 May 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Pengcheng Li
Jianzong Wang
Xulong Zhang
Yong Zhang
Jing Xiao
Ning Cheng
DRL
77
2
0
02 May 2024
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
96
0
0
01 May 2024
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
Ziqi Liang
Jianzong Wang
Xulong Zhang
Yong Zhang
Ning Cheng
Jing Xiao
61
1
0
30 Apr 2024
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
Jianzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
63
0
0
30 Apr 2024
U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI
Tanja Sarcevic
Alicja Karlowicz
Rudolf Mayer
Ricardo A. Baeza-Yates
Andreas Rauber
103
7
0
22 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
104
27
0
15 Apr 2024
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
Zhiqing Hong
Rongjie Huang
Xize Cheng
Yongqi Wang
Ruiqi Li
Fuming You
Zhou Zhao
Zhimeng Zhang
68
10
0
14 Apr 2024
1
2
3
4
5
6
Next