v1v2 (latest)

NAUTILUS: a Versatile Voice Cloning System

22 May 2020

Hieu-Thi Luong

Junichi Yamagishi

ArXiv (abs)PDF HTML

Papers citing "NAUTILUS: a Versatile Voice Cloning System"

25 / 25 papers shown

AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

181

27 Sep 2025

Dataset of News Articles with Provenance Metadata for Media Relevance Assessment

Tomas Peterka

Matyas Bohacek

247

11 Jun 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

450

01 May 2025

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Hieu-Thi Luong

Haoyang Li

Lin Zhang

Kong Aik Lee

Eng Siong Chng

378

23 Sep 2024

Intelli-Z: Toward Intelligible Zero-Shot TTS

269

25 Jan 2024

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

22 Jan 2024

TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection

Zhiba Su

179

27 Jun 2023

Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022

Haizhou Li

372

11 Aug 2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice ConversionInterspeech (Interspeech), 2022

Yinjiao Lei

Shan Yang

Jian Cong

Linfu Xie

Jane Polak Scowcroft

DiffM

235

05 Jul 2022

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversionInterspeech (Interspeech), 2022

Magdalena Proszewska

Grzegorz Beringer

Daniel Sáez-Trigueros

Thomas Merritt

Abdelhamid Ezzerg

Roberto Barra-Chicote

187

04 Jul 2022

Self-supervised learning for robust voice cloningInterspeech (Interspeech), 2022

...

Aimilios Chalamandaris

Pirros Tsiakoulis

SSL

272

07 Apr 2022

Improve few-shot voice cloning using multi-modal learningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Haitong Zhang

Yue Lin

176

18 Mar 2022

Human Detection of Political Speech Deepfakes across Transcripts, Audio, and VideoNature Communications (Nat Commun), 2022

Matthew Groh

Aruna Sankaranarayanan

417

25 Feb 2022

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcriptionIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2022

Dabiao Ma

Yitong Zhang

Meng Li

Feng Ye

123

19 Jan 2022

A Survey on Neural Speech Synthesis

Xu Tan

470

451

29 Jun 2021

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Hieu-Thi Luong

Junichi Yamagishi

263

25 Jun 2021

MASS: Multi-task Anthropomorphic Speech Synthesis FrameworkComputer Speech and Language (CSL), 2021

Jinyin Chen

Linhui Ye

Zhaoyan Ming

154

10 May 2021

AdaSpeech 2: Adaptive Text to Speech with Untranscribed DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Xu Tan

138

20 Apr 2021

Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward

567

452

25 Feb 2021

Optimizing voice conversion network with cycle consistency loss of speaker identitySpoken Language Technology Workshop (SLT), 2020

Hongqiang Du

Xiaohai Tian

Lei Xie

Haizhou Li

237

17 Nov 2020

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Hieu-Thi Luong

Junichi Yamagishi

206

08 Oct 2020

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

Mingyang Zhang

Yi Zhou

Li Zhao

Haizhou Li

276

30 Sep 2020

Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Rohan Kumar Das

254

236

28 Aug 2020

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep LearningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020

Haizhou Li

643

413

09 Aug 2020

Pretraining Techniques for Sequence-to-Sequence Voice ConversionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020

392

07 Aug 2020