ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.04448
  4. Cited By
Learning to Speak Fluently in a Foreign Language: Multilingual Speech
  Synthesis and Cross-Language Voice Cloning
v1v2 (latest)

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Interspeech (Interspeech), 2019
9 July 2019
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Zhiwen Chen
RJ Skerry-Ryan
Ye Jia
Andrew Rosenberg
Bhuvana Ramabhadran
ArXiv (abs)PDFHTML

Papers citing "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning"

50 / 100 papers shown
Randomness from causally independent processes
Randomness from causally independent processes
Martin Sandfuchs
Carla Ferradini
R. Renner
CML
196
0
0
06 Oct 2025
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
Alessio Falai
Ziyao Zhang
Akos Gangoly
134
0
0
25 Aug 2025
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
Meng-Ping Lin
Enoch Hsin-Ho Huang
Shao-Yi Chien
Yu Tsao
131
0
0
19 Aug 2025
Toward Machine Interpreting: Lessons from Human Interpreting Studies
Toward Machine Interpreting: Lessons from Human Interpreting Studies
Matthias Sperber
Maureen de Seyssel
Jiajun Bao
Matthias Paulik
AI4CE
192
2
0
11 Aug 2025
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Pranav Pawar
Akshansh Dwivedi
Jenish Boricha
Himanshu Gohil
Aditya Dubey
217
1
0
19 Jun 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
426
6
0
01 May 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
389
2
0
31 Dec 2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-SpeechConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Taejun Bak
Youngsik Eom
SeungJae Choi
Young-Sun Joo
257
2
0
04 Oct 2024
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual
  and Low-Resource Text-to-Speech
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-SpeechConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Youngjae Kim
Yejin Jeon
Gary Geunbae Lee
308
1
0
27 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in
  Cross-Lingual Speech Anti-Spoofing
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-SpoofingSpoken Language Technology Workshop (SLT), 2024
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
336
10
0
12 Sep 2024
A multilingual training strategy for low resource Text to Speech
A multilingual training strategy for low resource Text to Speech
Asma Amalas
Mounir Ghogho
Mohamed Chetouani
Rachid Oulad Haj Thami
303
3
0
02 Sep 2024
wav2graph: A Framework for Supervised Learning Knowledge Graph from
  Speech
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
Khai-Nguyen Nguyen
Quy-Anh Dang
Tan-Hanh Pham
Truong-Son Hy
319
1
0
08 Aug 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
190
5
0
13 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual
  Text-to-Speech
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar
Nirmesh Shah
Sai Akarsh
Pankaj Wasnik
R. Shah
225
5
0
12 Jun 2024
Building speech corpus with diverse voice characteristics for its
  prompt-based representation
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
202
1
0
20 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker
  Replication
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
291
2
0
06 Mar 2024
G4G:A Generic Framework for High Fidelity Talking Face Generation with
  Fine-grained Intra-modal Alignment
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Juan Zhang
Jiahao Chen
Cheng Wang
Zhi-Yang Yu
Tangquan Qi
Di Wu
CVBM
305
0
0
28 Feb 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
307
39
0
22 Dec 2023
A Representative Study on Human Detection of Artificially Generated
  Media Across Countries
A Representative Study on Human Detection of Artificially Generated Media Across Countries
Joel Frank
Franziska Herbert
Jonas Ricker
Lea Schonherr
Thorsten Eisenhofer
Asja Fischer
Markus Dürmuth
Thorsten Holz
282
35
0
10 Dec 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Zero-Shot Emotion Transfer For Cross-Lingual Speech SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
299
6
0
06 Oct 2023
BiSinger: Bilingual Singing Voice Synthesis
BiSinger: Bilingual Singing Voice SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2023
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
256
7
0
25 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics
  Description for Prompt-based Control
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based ControlAutomatic Speech Recognition & Understanding (ASRU), 2023
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
216
15
0
24 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice
  Synthesizer Trained on Monolingual Singers
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual SingersAutomatic Speech Recognition & Understanding (ASRU), 2023
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
219
8
0
22 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for
  Robust Polyglot Text-To-Speech
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-SpeechInternational Conference on Neural Information Processing (ICONIP), 2023
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
259
1
0
15 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and MandarinIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
258
18
0
02 Sep 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
265
10
0
03 Aug 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style
  Generalization in Cross-Lingual Text-to-Speech
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-SpeechInterspeech (Interspeech), 2023
Yahuan Cong
Haoyu Zhang
Hao-Ping Lin
Shichao Liu
Chunfeng Wang
Yi Ren
Xiang Yin
Zejun Ma
149
1
0
27 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-SpeechInterspeech (Interspeech), 2023
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
194
11
0
25 Jun 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech
  Translation
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech TranslationInterspeech (Interspeech), 2023
Kun Song
Yi Ren
Yinjiao Lei
Chunfeng Wang
Kun Wei
Linfu Xie
Xiang Yin
Zejun Ma
271
11
0
28 May 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
490
569
0
22 May 2023
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low
  Resource Setting
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah
Vishal Tambrahalli
Saiteja Kosgi
N. Pedanekar
Vineet Gandhi
183
1
0
19 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with
  Bidirectional Attention Mechanism for Automatic Dubbing
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic DubbingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
284
6
0
09 May 2023
Generative AI for learning: Investigating the potential of synthetic
  learning videos
Generative AI for learning: Investigating the potential of synthetic learning videos
Daniel Leiker
Ashley Ricker Gyllen
Ismail Eldesouky
M. Cukurova
180
30
0
07 Apr 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Cross-speaker Emotion Transfer by Manipulating Speech Style LatentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Suhee Jo
Younggun Lee
Yookyung Shin
Yeongtae Hwang
Taesu Kim
238
7
0
15 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
397
773
0
07 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
423
252
0
07 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representationsFindings (Findings), 2023
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
429
11
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
277
10
0
28 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
210
7
0
24 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend
Modelling low-resource accents without accent-specific TTS frontendIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Georgi Tinchev
Marta Czarnowska
Kamil Deja
K. Yanagisawa
Marius Cotescu
170
5
0
11 Jan 2023
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
146
1
0
07 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representationsSpeech Communication (Speech Commun.), 2022
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
182
10
0
29 Nov 2022
Voice-preserving Zero-shot Multiple Accent Conversion
Voice-preserving Zero-shot Multiple Accent ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Mumin Jin
Prashant Serai
Jilong Wu
Andros Tjandra
Vimal Manohar
Qing He
296
20
0
23 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems
  via Vowel Space
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
235
2
0
06 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
251
3
0
31 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
306
22
0
27 Oct 2022
Explicit Intensity Control for Accented Text-to-speech
Explicit Intensity Control for Accented Text-to-speechInterspeech (Interspeech), 2022
Rui Liu
Haolin Zuo
De Hu
Guanglai Gao
Haizhou Li
270
9
0
27 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages
SQuId: Measuring Speech Naturalness in Many LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
365
27
0
12 Oct 2022
Controllable Accented Text-to-Speech Synthesis
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
241
6
0
22 Sep 2022
Deep Speech Synthesis from Articulatory Representations
Deep Speech Synthesis from Articulatory RepresentationsInterspeech (Interspeech), 2022
Peter Wu
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
236
34
0
13 Sep 2022
12
Next
Page 1 of 2