Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1907.04448
Cited By
v1
v2 (latest)
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Interspeech (Interspeech), 2019
9 July 2019
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Zhiwen Chen
RJ Skerry-Ryan
Ye Jia
Andrew Rosenberg
Bhuvana Ramabhadran
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning"
50 / 100 papers shown
Randomness from causally independent processes
Martin Sandfuchs
Carla Ferradini
R. Renner
CML
196
0
0
06 Oct 2025
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
Alessio Falai
Ziyao Zhang
Akos Gangoly
134
0
0
25 Aug 2025
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
Meng-Ping Lin
Enoch Hsin-Ho Huang
Shao-Yi Chien
Yu Tsao
131
0
0
19 Aug 2025
Toward Machine Interpreting: Lessons from Human Interpreting Studies
Matthias Sperber
Maureen de Seyssel
Jiajun Bao
Matthias Paulik
AI4CE
192
2
0
11 Aug 2025
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Pranav Pawar
Akshansh Dwivedi
Jenish Boricha
Himanshu Gohil
Aditya Dubey
217
1
0
19 Jun 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
426
6
0
01 May 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
389
2
0
31 Dec 2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Taejun Bak
Youngsik Eom
SeungJae Choi
Young-Sun Joo
257
2
0
04 Oct 2024
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Youngjae Kim
Yejin Jeon
Gary Geunbae Lee
308
1
0
27 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
Spoken Language Technology Workshop (SLT), 2024
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
336
10
0
12 Sep 2024
A multilingual training strategy for low resource Text to Speech
Asma Amalas
Mounir Ghogho
Mohamed Chetouani
Rachid Oulad Haj Thami
303
3
0
02 Sep 2024
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
Khai-Nguyen Nguyen
Quy-Anh Dang
Tan-Hanh Pham
Truong-Son Hy
319
1
0
08 Aug 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
190
5
0
13 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar
Nirmesh Shah
Sai Akarsh
Pankaj Wasnik
R. Shah
225
5
0
12 Jun 2024
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
202
1
0
20 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
291
2
0
06 Mar 2024
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Juan Zhang
Jiahao Chen
Cheng Wang
Zhi-Yang Yu
Tangquan Qi
Di Wu
CVBM
305
0
0
28 Feb 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
307
39
0
22 Dec 2023
A Representative Study on Human Detection of Artificially Generated Media Across Countries
Joel Frank
Franziska Herbert
Jonas Ricker
Lea Schonherr
Thorsten Eisenhofer
Asja Fischer
Markus Dürmuth
Thorsten Holz
282
35
0
10 Dec 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Automatic Speech Recognition & Understanding (ASRU), 2023
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
299
6
0
06 Oct 2023
BiSinger: Bilingual Singing Voice Synthesis
Automatic Speech Recognition & Understanding (ASRU), 2023
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
256
7
0
25 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Automatic Speech Recognition & Understanding (ASRU), 2023
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
216
15
0
24 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Automatic Speech Recognition & Understanding (ASRU), 2023
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
219
8
0
22 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
International Conference on Neural Information Processing (ICONIP), 2023
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
259
1
0
15 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
258
18
0
02 Sep 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
265
10
0
03 Aug 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Interspeech (Interspeech), 2023
Yahuan Cong
Haoyu Zhang
Hao-Ping Lin
Shichao Liu
Chunfeng Wang
Yi Ren
Xiang Yin
Zejun Ma
149
1
0
27 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Interspeech (Interspeech), 2023
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
194
11
0
25 Jun 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Interspeech (Interspeech), 2023
Kun Song
Yi Ren
Yinjiao Lei
Chunfeng Wang
Kun Wei
Linfu Xie
Xiang Yin
Zejun Ma
271
11
0
28 May 2023
Scaling Speech Technology to 1,000+ Languages
Journal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
490
569
0
22 May 2023
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah
Vishal Tambrahalli
Saiteja Kosgi
N. Pedanekar
Vineet Gandhi
183
1
0
19 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
284
6
0
09 May 2023
Generative AI for learning: Investigating the potential of synthetic learning videos
Daniel Leiker
Ashley Ricker Gyllen
Ismail Eldesouky
M. Cukurova
180
30
0
07 Apr 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Suhee Jo
Younggun Lee
Yookyung Shin
Yeongtae Hwang
Taesu Kim
238
7
0
15 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
397
773
0
07 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
423
252
0
07 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
Findings (Findings), 2023
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
429
11
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
277
10
0
28 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
210
7
0
24 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Georgi Tinchev
Marta Czarnowska
Kamil Deja
K. Yanagisawa
Marius Cotescu
170
5
0
11 Jan 2023
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
146
1
0
07 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Speech Communication (Speech Commun.), 2022
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
182
10
0
29 Nov 2022
Voice-preserving Zero-shot Multiple Accent Conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Mumin Jin
Prashant Serai
Jilong Wu
Andros Tjandra
Vimal Manohar
Qing He
296
20
0
23 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
235
2
0
06 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
251
3
0
31 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
306
22
0
27 Oct 2022
Explicit Intensity Control for Accented Text-to-speech
Interspeech (Interspeech), 2022
Rui Liu
Haolin Zuo
De Hu
Guanglai Gao
Haizhou Li
270
9
0
27 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
365
27
0
12 Oct 2022
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
241
6
0
22 Sep 2022
Deep Speech Synthesis from Articulatory Representations
Interspeech (Interspeech), 2022
Peter Wu
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
236
34
0
13 Sep 2022
1
2
Next
Page 1 of 2