An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

9 August 2020

Haizhou Li

Papers citing "An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning"

45 / 145 papers shown

Title
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts Paige Tuttosi Emma Hughson Akihiro Matsufuji Angelica Lim 15 4 0 10 May 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion Weida Liang Lantian Li Wenqiang Du Dong Wang 43 0 0 08 Apr 2022
HiFi-VC: High Quality ASR-Based Voice Conversion A. Kashkin I. Karpukhin S. Shishkin 21 5 0 31 Mar 2022
Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion Jiachen Lian Chunlei Zhang Dong Yu DRL 9 50 0 30 Mar 2022
DGC-vector: A new speaker embedding for zero-shot voice conversion Ruitong Xiao Haitong Zhang Yue Lin 10 11 0 18 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising flows Thomas Merritt Abdelhamid Ezzerg Piotr Bilinski Magdalena Proszewska Kamil Pokora Roberto Barra-Chicote Daniel Korzekwa 20 14 0 15 Mar 2022
The Vicomtech Audio Deepfake Detection System based on Wav2Vec2 for the 2022 ADD Challenge Juan M. Martín-Donas Aitor Álvarez 30 98 0 03 Mar 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen KELM 12 18 0 21 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge Ziyi Chen Hua Hua Yuxiang Zhang Ming Li Pengyuan Zhang 6 0 0 29 Jan 2022
Invertible Voice Conversion Zexin Cai Ming Li BDL 23 1 0 26 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 41 54 0 10 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion Wendong Gan Bolong Wen Yin Yan Haitao Chen Zhichao Wang Hongqiang Du Lei Xie Kaixuan Guo Hai Li 8 14 0 02 Jan 2022
Contrastive Fine-grained Class Clustering via Generative Adversarial Networks Yunji Kim Jung-Woo Ha GAN 19 13 0 30 Dec 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 17 16 0 19 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning Shijun Wang Dimche Kostadinov Damian Borth 14 10 0 27 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion Zongyang Du Berrak Sisman Kun Zhou Haizhou Li 11 24 0 20 Oct 2021
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding Sergey Nikonorov Berrak Sisman Mingyang Zhang Haizhou Li 15 2 0 13 Oct 2021
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example Hieu-Thi Luong Junichi Yamagishi 44 9 0 11 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 30 19 0 07 Oct 2021
A Tandem Framework Balancing Privacy and Security for Voice User Interfaces Ranya Aloufi Hamed Haddadi David E. Boyle 20 2 0 21 Jul 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer Zongyang Du Berrak Sisman Kun Zhou Haizhou Li 14 20 0 08 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 351 0 29 Jun 2021
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows Iván Vallés-Pérez Julian Roth Grzegorz Beringer Roberto Barra-Chicote J. Droppo 13 8 0 10 Jun 2021
NVC-Net: End-to-End Adversarial Voice Conversion Bac Nguyen Cong Fabien Cardinaux AAML 29 41 0 02 Jun 2021
StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts Matthew Baas Herman Kamper 17 6 0 31 May 2021
Emotional Voice Conversion: Theory, Databases and ESD Kun Zhou Berrak Sisman Rui Liu Haizhou Li 15 167 0 31 May 2021
An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion Sandipan Dhar N. D. Jana Swagatam Das 17 17 0 25 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion Hirokazu Kameoka Kou Tanaka Takuhiro Kaneko 29 21 0 14 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability Rui Liu Berrak Sisman Haizhou Li 21 32 0 03 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training Kun Zhou Berrak Sisman Haizhou Li 10 27 0 31 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward Momina Masood M. Nawaz K. Malik A. Javed Aun Irtaza AAML 112 296 0 25 Feb 2021
Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks Peter Wu Paul Pu Liang Jiatong Shi Ruslan Salakhutdinov Shinji Watanabe Louis-Philippe Morency 18 8 0 22 Jan 2021
Technology-driven Alteration of Nonverbal Cues and its Effects on Negotiation Raiyan Abdul Baten E. Hoque 6 7 0 08 Dec 2020
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech Kun Zhou Berrak Sisman Haizhou Li DRL 6 40 0 03 Nov 2020
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset Kun Zhou Berrak Sisman Rui Liu Haizhou Li 8 185 0 28 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 13 24 0 23 Oct 2020
FastVC: Fast Voice Conversion with non-parallel data Oriol Barbany Milos Cernak 6 7 0 08 Oct 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion Yi Zhao Wen-Chin Huang Xiaohai Tian Junichi Yamagishi Rohan Kumar Das Tomi Kinnunen Zhenhua Ling T. Toda 11 204 0 28 Aug 2020
Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN Zongyang Du Kun Zhou Berrak Sisman Haizhou Li 19 8 0 11 Aug 2020
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data Junchen Lu Kun Zhou Berrak Sisman Haizhou Li DRL 6 19 0 10 Aug 2020
Pretraining Techniques for Sequence-to-Sequence Voice Conversion Wen-Chin Huang Tomoki Hayashi Yi-Chiao Wu Hirokazu Kameoka T. Toda 12 38 0 07 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 9 73 0 04 Aug 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 12 30 0 18 May 2020
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 217 239 0 25 Sep 2019
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 214 7,923 0 17 Aug 2015