Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.04904
Cited By
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
7 June 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
Logan Hart
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model"
47 / 47 papers shown
Title
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
79
1
0
07 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
32
0
0
01 May 2025
Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices
Éva Székely
Jūra Miniota
Míša
Hejná
25
0
0
14 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
H. Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
49
0
0
29 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
54
0
0
14 Mar 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
S.
Mohammed Irfan Kurpath
Sahal Shaji Mullappilly
Jean Lahoud
Fahad A Khan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
AuLLM
72
0
0
06 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
X. Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
S. Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
73
10
0
03 Mar 2025
Steganography Beyond Space-Time with Chain of Multimodal AI
Ching-Chun Chang
Isao Echizen
69
0
0
25 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
88
0
0
21 Feb 2025
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wei Deng
Siyi Zhou
Jingchen Shu
Jinchao Wang
Lu Wang
VLM
42
1
0
08 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
59
2
0
07 Feb 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
67
0
0
23 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
42
0
0
31 Dec 2024
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang
Weiye Xiao
Zhengyu Lin
H. Zhang
Tianxiang Ren
Yang Gao
Zhiqian Lin
Zhongang Cai
Lei Yang
Ziwei Liu
79
3
0
29 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
114
1
0
22 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Y. Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
35
10
0
02 Nov 2024
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang
Wei Hao
Aroon Sankoh
William Lin
Emanuel Mendiola-Ortiz
Junfeng Yang
Chengzhi Mao
AAML
26
2
0
31 Oct 2024
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
Dake Guo
J.-H. Yao
Xinfa Zhu
Kangxiang Xia
Zhao Guo
Ziyu Zhang
Y. Wang
Jie Liu
Lei Xie
18
0
0
31 Oct 2024
Estuary: A Framework For Building Multimodal Low-Latency Real-Time Socially Interactive Agents
Spencer Lin
Basem Rizk
Miru Jun
Andy Artze
Caitlin Sullivan
Sharon Mozgai
Scott Fisher
13
1
0
26 Oct 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand
Praveen Srinivasa Varadhan
Mehak Singal
Mitesh M. Khapra
13
0
0
23 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
30
2
0
16 Oct 2024
Unsupervised Data Validation Methods for Efficient Model Training
Yurii Paniv
27
1
0
10 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
16
0
0
09 Oct 2024
Augmentation through Laundering Attacks for Audio Spoof Detection
Hashim Ali
Surya Subramani
Hafiz Malik
16
0
0
01 Oct 2024
Zero-Shot Text-to-Speech from Continuous Text Streams
Trung D. Q. Dang
David Aponte
Dung Tran
Tianyi Chen
K. Koishida
AuLLM
VLM
24
3
0
01 Oct 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
41
1
0
28 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
Nohil Park
Heeseung Kim
Che Hyun Lee
Jooyoung Choi
Jiheum Yeom
Sungroh Yoon
20
2
0
24 Sep 2024
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
Jiheum Yeom
Heeseung Kim
Jooyoung Choi
Che Hyun Lee
Nohil Park
Sungroh Yoon
19
1
0
24 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
54
2
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
46
3
0
23 Sep 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
29
2
0
18 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
22
1
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
49
0
0
14 Sep 2024
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
Li-Wei Chen
Hung-Shin Lee
Chen-Chi Chang
VLM
22
0
0
03 Sep 2024
A Framework for Synthetic Audio Conversations Generation using Large Language Models
Kaung Myat Kyaw
Jonathan Hoyin Chan
SyDa
29
2
0
02 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
23
37
0
01 Sep 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
58
0
0
30 Aug 2024
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Zehai Tu
Guangyan Zhang
Yiting Lu
Adaeze Adigwe
Simon King
Yiwen Guo
27
0
0
29 Aug 2024
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Samuele Cornell
Jordan Darefsky
Zhiyao Duan
Shinji Watanabe
SyDa
68
4
0
17 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
40
37
0
16 Aug 2024
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
26
0
0
17 Jul 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
S.
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
85
17
0
23 Jun 2024
Voice Disorder Analysis: a Transformer-based Approach
Alkis Koudounas
Gabriele Ciravegna
M. Fantini
G. Succo
Erika Crosetti
Tania Cerquitelli
Elena Baralis
27
3
0
20 Jun 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
26
5
0
13 Jun 2024
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
A. S. Soares
A. R. G. Filho
14
5
0
16 Jun 2023
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
171
372
0
04 Dec 2021
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Z. Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
201
817
0
12 Jun 2018
1