v1v2 (latest)

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Interspeech (Interspeech), 2019

9 July 2019

Andrew Rosenberg

Bhuvana Ramabhadran

ArXiv (abs)PDF HTML

Papers citing "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning"

50 / 100 papers shown

Randomness from causally independent processes

196

06 Oct 2025

Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters

Alessio Falai

Ziyao Zhang

Akos Gangoly

134

25 Aug 2025

End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments

131

19 Aug 2025

Toward Machine Interpreting: Lessons from Human Interpreting Studies

192

11 Aug 2025

Optimizing Multilingual Text-To-Speech with Accents & Emotions

217

19 Jun 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

426

01 May 2025

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024

389

31 Dec 2024

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-SpeechConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

257

04 Oct 2024

Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-SpeechConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Youngjae Kim

Yejin Jeon

Gary Geunbae Lee

308

27 Sep 2024

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-SpoofingSpoken Language Technology Workshop (SLT), 2024

336

12 Sep 2024

A multilingual training strategy for low resource Text to Speech

Asma Amalas

Mounir Ghogho

Mohamed Chetouani

Rachid Oulad Haj Thami

303

02 Sep 2024

wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

319

08 Aug 2024

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

Xin Wang

...

Jianwu Dang

190

13 Jun 2024

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Ashishkumar Gudmalwar

Nirmesh Shah

Sai Akarsh

Pankaj Wasnik

R. Shah

225

12 Jun 2024

Building speech corpus with diverse voice characteristics for its prompt-based representation

Yuki Saito

Hiroshi Saruwatari

202

20 Mar 2024

Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication

Yejin Jeon

Gary Geunbae Lee

291

06 Mar 2024

G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment

305

28 Feb 2024

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Xin Wang

Longbiao Wang

307

22 Dec 2023

A Representative Study on Human Detection of Artificially Generated Media Across Countries

282

10 Dec 2023

Zero-Shot Emotion Transfer For Cross-Lingual Speech SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2023

Lei Xie

299

06 Oct 2023

BiSinger: Bilingual Singing Voice SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2023

256

25 Sep 2023

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based ControlAutomatic Speech Recognition & Understanding (ASRU), 2023

Yuki Saito

Hiroshi Saruwatari

216

24 Sep 2023

CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual SingersAutomatic Speech Recognition & Understanding (ASRU), 2023

Xintong Wang

Chang Zeng

Jun Chen

Chunhui Wang

219

22 Sep 2023

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-SpeechInternational Conference on Neural Information Processing (ICONIP), 2023

259

15 Sep 2023

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and MandarinIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Jian Cong

258

02 Sep 2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

265

03 Aug 2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-SpeechInterspeech (Interspeech), 2023

Xiang Yin

149

27 Jun 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-SpeechInterspeech (Interspeech), 2023

Sen Liu

Yiwei Guo

Chenpeng Du

Xie Chen

Kai Yu

194

25 Jun 2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech TranslationInterspeech (Interspeech), 2023

Xiang Yin

271

28 May 2023

Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023

...

Yossi Adi

490

569

22 May 2023

MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

183

19 May 2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic DubbingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Zhiyong Wu

Yuxuan Wang

284

09 May 2023

Generative AI for learning: Investigating the potential of synthetic learning videos

180

07 Apr 2023

Cross-speaker Emotion Transfer by Manipulating Speech Style LatentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

238

15 Mar 2023

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Siyu Li

Philip S. Yu

Lichao Sun

397

773

07 Mar 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

...

423

252

07 Mar 2023

ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representationsFindings (Findings), 2023

429

01 Mar 2023

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

277

28 Feb 2023

Multilingual Multiaccented Multispeaker TTS with RADTTS

210

24 Jan 2023

Modelling low-resource accents without accent-specific TTS frontendIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

170

11 Jan 2023

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

Fengyu Yang

Jian Luan

Yujun Wang

146

07 Dec 2022

Controllable speech synthesis by learning discrete phoneme-level prosodic representationsSpeech Communication (Speech Commun.), 2022

Aimilios Chalamandaris

Pirros Tsiakoulis

P. Mastorocostas

182

29 Nov 2022

Voice-preserving Zero-shot Multiple Accent ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

296

23 Nov 2022

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

235

06 Nov 2022

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

Aimilios Chalamandaris

Pirros Tsiakoulis

251

31 Oct 2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Zhehuai Chen

Andrew Rosenberg

Bhuvana Ramabhadran

306

27 Oct 2022

Explicit Intensity Control for Accented Text-to-speechInterspeech (Interspeech), 2022

Haizhou Li

270

27 Oct 2022

SQuId: Measuring Speech Naturalness in Many LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

365

12 Oct 2022

Controllable Accented Text-to-Speech Synthesis

Rui Liu

Berrak Sisman

Guanglai Gao

Haizhou Li

241

22 Sep 2022

Deep Speech Synthesis from Articulatory RepresentationsInterspeech (Interspeech), 2022

Peter Wu

Shinji Watanabe

Louis Goldstein

A. Black

Gopala K. Anumanchipalli

236

13 Sep 2022