ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.04448
  4. Cited By
Learning to Speak Fluently in a Foreign Language: Multilingual Speech
  Synthesis and Cross-Language Voice Cloning
v1v2 (latest)

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

9 July 2019
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Zhiwen Chen
RJ Skerry-Ryan
Ye Jia
Andrew Rosenberg
Bhuvana Ramabhadran
ArXiv (abs)PDFHTML

Papers citing "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning"

50 / 96 papers shown
Title
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Pranav Pawar
Akshansh Dwivedi
Jenish Boricha
Himanshu Gohil
Aditya Dubey
5
0
0
19 Jun 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
116
0
0
31 Dec 2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
Taejun Bak
Youngsik Eom
SeungJae Choi
Young-Sun Joo
47
1
0
04 Oct 2024
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual
  and Low-Resource Text-to-Speech
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
Youngjae Kim
Yejin Jeon
Gary Geunbae Lee
58
1
0
27 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in
  Cross-Lingual Speech Anti-Spoofing
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
100
2
0
12 Sep 2024
A multilingual training strategy for low resource Text to Speech
A multilingual training strategy for low resource Text to Speech
Asma Amalas
Mounir Ghogho
Mohamed Chetouani
Rachid Oulad Haj Thami
68
2
0
02 Sep 2024
wav2graph: A Framework for Supervised Learning Knowledge Graph from
  Speech
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
Khai-Nguyen Nguyen
Quy-Anh Dang
Tan-Hanh Pham
Truong-Son Hy
77
0
0
08 Aug 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
54
2
0
13 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual
  Text-to-Speech
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar
Nirmesh Shah
Sai Akarsh
Pankaj Wasnik
R. Shah
53
3
0
12 Jun 2024
Building speech corpus with diverse voice characteristics for its
  prompt-based representation
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
65
1
0
20 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker
  Replication
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
58
2
0
06 Mar 2024
G4G:A Generic Framework for High Fidelity Talking Face Generation with
  Fine-grained Intra-modal Alignment
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Juan Zhang
Jiahao Chen
Cheng Wang
Zhi-Yang Yu
Tangquan Qi
Di Wu
CVBM
71
0
0
28 Feb 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
116
25
0
22 Dec 2023
A Representative Study on Human Detection of Artificially Generated
  Media Across Countries
A Representative Study on Human Detection of Artificially Generated Media Across Countries
Joel Frank
Franziska Herbert
Jonas Ricker
Lea Schonherr
Thorsten Eisenhofer
Asja Fischer
Markus Dürmuth
Thorsten Holz
86
15
0
10 Dec 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
84
3
0
06 Oct 2023
BiSinger: Bilingual Singing Voice Synthesis
BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
48
5
0
25 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics
  Description for Prompt-based Control
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
52
11
0
24 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice
  Synthesizer Trained on Monolingual Singers
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
61
6
0
22 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for
  Robust Polyglot Text-To-Speech
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
65
1
0
15 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
78
9
0
02 Sep 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
94
10
0
03 Aug 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style
  Generalization in Cross-Lingual Text-to-Speech
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Yahuan Cong
Haoyu Zhang
Hao-Ping Lin
Shichao Liu
Chunfeng Wang
Yi Ren
Xiang Yin
Zejun Ma
39
1
0
27 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
88
6
0
25 Jun 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech
  Translation
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Kun Song
Yi Ren
Yinjiao Lei
Chunfeng Wang
Kun Wei
Linfu Xie
Xiang Yin
Zejun Ma
67
9
0
28 May 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
164
360
0
22 May 2023
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low
  Resource Setting
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah
Vishal Tambrahalli
Saiteja Kosgi
N. Pedanekar
Vineet Gandhi
65
0
0
19 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with
  Bidirectional Attention Mechanism for Automatic Dubbing
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
65
3
0
09 May 2023
Generative AI for learning: Investigating the potential of synthetic
  learning videos
Generative AI for learning: Investigating the potential of synthetic learning videos
Daniel Leiker
Ashley Ricker Gyllen
Ismail Eldesouky
M. Cukurova
24
22
0
07 Apr 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Suhee Jo
Younggun Lee
Yookyung Shin
Yeongtae Hwang
Taesu Kim
45
4
0
15 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
102
552
0
07 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
98
187
0
07 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
74
8
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
74
9
0
28 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Francesco Ferroni
Bryan Catanzaro
58
6
0
24 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend
Modelling low-resource accents without accent-specific TTS frontend
Georgi Tinchev
Marta Czarnowska
Kamil Deja
K. Yanagisawa
Marius Cotescu
69
4
0
11 Jan 2023
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
46
1
0
07 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
46
7
0
29 Nov 2022
Voice-preserving Zero-shot Multiple Accent Conversion
Voice-preserving Zero-shot Multiple Accent Conversion
Mumin Jin
Prashant Serai
Jilong Wu
Andros Tjandra
Vimal Manohar
Qing He
60
13
0
23 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems
  via Vowel Space
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
65
2
0
06 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
61
2
0
31 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
126
20
0
27 Oct 2022
Explicit Intensity Control for Accented Text-to-speech
Explicit Intensity Control for Accented Text-to-speech
Rui Liu
Haolin Zuo
De Hu
Guanglai Gao
Haizhou Li
92
7
0
27 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages
SQuId: Measuring Speech Naturalness in Many Languages
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
74
18
0
12 Oct 2022
Controllable Accented Text-to-Speech Synthesis
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
74
6
0
22 Sep 2022
Deep Speech Synthesis from Articulatory Representations
Deep Speech Synthesis from Articulatory Representations
Peter Wu
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
78
26
0
13 Sep 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical
  Approach For Accent Transfer Tasks
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
L. Finkelstein
Heiga Zen
Norman Casagrande
Chun-an Chan
Ye Jia
...
Jonathan Shen
V. Wan
Yu Zhang
Yonghui Wu
R. Clark
50
9
0
28 Aug 2022
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot
  Text-To-Speech (TTS)
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)
Ariadna Sánchez
Alessio Falai
Ziyao Zhang
Orazio Angelini
K. Yanagisawa
90
7
0
04 Jul 2022
Mix and Match: An Empirical Study on Training Corpus Composition for
  Polyglot Text-To-Speech (TTS)
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Ziyao Zhang
Alessio Falai
Ariadna Sánchez
Orazio Angelini
K. Yanagisawa
51
4
0
04 Jul 2022
Talking Face Generation with Multilingual TTS
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song
Sanghyun Woo
Junhyeok Lee
S. Yang
Hyunjae Cho
Youseong Lee
Dongho Choi
Kang-Wook Kim
CVBM
77
22
0
13 May 2022
12
Next