v1v2 (latest)

Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding

18 May 2020

Papers citing "Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding"

38 / 38 papers shown

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

666

20 May 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

450

01 May 2025

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

750

29 Apr 2025

Towards Zero-Shot Text-To-Speech for Arabic Dialects

Khai Duy Doan

Abdul Waheed

Muhammad Abdul-Mageed

429

24 Jun 2024

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelInterspeech (Interspeech), 2024

...

394

254

07 Jun 2024

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Ammarah Hashmi

Sahibzada Adil Shahzad

Chia-Wen Lin

Yu Tsao

Hsin-Min Wang

251

07 May 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice SynthesisAAAI Conference on Artificial Intelligence (AAAI), 2023

564

17 Dec 2023

Detecting Voice Cloning Attacks via Timbre Watermarking

319

06 Dec 2023

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake DatasetACM Multimedia (ACM MM), 2023

Zhixi Cai

Shreya Ghosh

Aman Pankaj Adatia

Munawar Hayat

Abhinav Dhall

Kalin Stefanov

268

26 Nov 2023

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal DirectionsInterspeech (Interspeech), 2023

211

26 Oct 2023

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic PromptsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shunwei Lei

Yixuan Zhou

Liyang Chen

Dan Luo

Zhiyong Wu

...

Shiyin Kang

213

21 Sep 2023

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

326

06 Sep 2023

Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled RepresentationsInterspeech (Interspeech), 2023

Wen Wang

Yang Song

S. Jha

227

24 Aug 2023

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech SynthesisInternational Conference on Learning Representations (ICLR), 2023

...

Xiang Yin

Zhou Zhao

367

14 Jul 2023

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech SynthesisInterspeech (Interspeech), 2023

Seong-Hyun Park

Bohyung Kim

Tae-Hyun Oh

233

26 May 2023

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured PruningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

265

21 Mar 2023

Warning: Humans Cannot Reliably Detect Speech DeepfakesPLoS ONE (PLoS ONE), 2023

362

19 Jan 2023

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

202

28 Oct 2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTSInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022

302

25 Oct 2022

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

Florian Lux

Julia Koch

Ngoc Thang Vu

238

21 Oct 2022

Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Aya Watanabe

Shinnosuke Takamichi

Yuki Saito

Detai Xin

Hiroshi Saruwatari

178

18 Oct 2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice ConversionInterspeech (Interspeech), 2022

Yinjiao Lei

Shan Yang

Jian Cong

Linfu Xie

Jane Polak Scowcroft

DiffM

224

05 Jul 2022

Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-SpeechSpoken Language Technology Workshop (SLT), 2022

Florian Lux

Julia Koch

Ngoc Thang Vu

259

24 Jun 2022

Fine-grained Noise Control for Multispeaker Speech SynthesisInterspeech (Interspeech), 2022

Aimilios Chalamandaris

Pirros Tsiakoulis

232

11 Apr 2022

Self-supervised learning for robust voice cloningInterspeech (Interspeech), 2022

...

Aimilios Chalamandaris

Pirros Tsiakoulis

SSL

270

07 Apr 2022

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech SynthesisInterspeech (Interspeech), 2022

Yixuan Zhou

Xiang Li

Zhiyong Wu

389

03 Apr 2022

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversionInterspeech (Interspeech), 2022

Edresson Casanova

C. Shulby

Alexander Korolev

Arnaldo Cândido Júnior

A. S. Soares

S. Aluísio

M. Ponti

387

29 Mar 2022

Attacker Attribution of Audio DeepfakesInterspeech (Interspeech), 2022

Nicolas Müller

Franziska Dieckmann

Jennifer Williams

152

28 Mar 2022

Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech SynthesisInternational Conference on Digital Signal Processing (DSP), 2022

Pengyu Cheng

Zhenhua Ling

229

02 Mar 2022

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing moduleIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Roberto Barra-Chicote

Bartek Perz

Jaime Lorenzo-Trueba

241

16 Feb 2022

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder

Zhou Zhao

126

11 Jan 2022

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021

Edresson Casanova

Julian Weber

C. Shulby

Arnaldo Cândido Júnior

Eren Golge

M. Ponti

838

585

04 Dec 2021

293

07 Nov 2021

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

661

07 Nov 2021

Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information

Qinghua Wu

Quanbo Shen

Jian Luan

YuJun Wang

274

07 Jul 2021

A Survey on Neural Speech Synthesis

Xu Tan

466

446

29 Jun 2021

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech ModelInterspeech (Interspeech), 2021

Arnaldo Cândido Júnior

A. S. Soares

S. Aluísio

M. Ponti

296

113

02 Apr 2021

A Survey on Machine Learning from Few SamplesPattern Recognition (Pattern Recognit.), 2020

377

06 Sep 2020