ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00436
  4. Cited By
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

Interspeech (Interspeech), 2022
1 April 2022
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
    VLMDiffM
ArXiv (abs)PDFHTML

Papers citing "AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios"

42 / 42 papers shown
Title
Zero-Shot Text-to-Speech for Vietnamese
Zero-Shot Text-to-Speech for VietnameseAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Thi Vu
L. T. Nguyen
Dat Quoc Nguyen
174
2
0
02 Jun 2025
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu
Ziyue Zhang
Ban Ma-bao
Yuqing Cai
Yongbin Yu
Renzeng Duojie
Xiangxiang Wang
Fan Gao
Cheng Huang
Nyima Tashi
221
3
0
20 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
314
3
0
01 May 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
309
0
0
21 Feb 2025
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Qianniu Chen
Xiaoyang Hao
Yangqiu Song
Yunxing Liu
Li Lu
202
0
0
15 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
150
0
0
31 Dec 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style DiffusionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
221
9
0
16 Sep 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
164
4
0
31 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
219
8
0
03 Jul 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and ReactionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
164
11
0
18 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
266
13
0
10 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
215
13
0
05 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
174
6
0
04 Jun 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
196
16
0
28 Apr 2024
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir Memon
449
2
0
28 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
326
110
0
12 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
205
7
0
18 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
149
10
0
05 Jan 2024
Self-Supervised Disentangled Representation Learning for Robust Target
  Speech Extraction
Self-Supervised Disentangled Representation Learning for Robust Target Speech ExtractionAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhaoxi Mu
Xinyu Yang
Sining Sun
Qing Yang
SSL
241
12
0
16 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationComputer Vision and Pattern Recognition (CVPR), 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
311
15
0
05 Dec 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual
  Deepfakes
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
193
11
0
29 Nov 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal DirectionsInterspeech (Interspeech), 2023
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
146
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
149
9
0
26 Oct 2023
Large-Scale Automatic Audiobook Creation
Large-Scale Automatic Audiobook CreationInterspeech (Interspeech), 2023
Brendan Walsh
Mark Hamilton
Greg Newby
Xi Wang
Serena Ruan
...
Lei He
Shaofei Zhang
Eric Dettinger
William T. Freeman
Markus Weimer
184
2
0
07 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLMDiffM
244
67
0
05 Sep 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-SpeechInterspeech (Interspeech), 2023
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
138
2
0
28 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with
  Disentangled Representations
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled RepresentationsInterspeech (Interspeech), 2023
Wen Wang
Yang Song
S. Jha
135
14
0
24 Aug 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech SynthesisInternational Conference on Learning Representations (ICLR), 2023
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
248
71
0
14 Jul 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to SpeechSpeech Synthesis Workshop (SSW), 2023
Daria Diatlova
V. Shutov
168
21
0
28 Jun 2023
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale
  Interfusion and Conditional Speaker Modulation
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker ModulationInterspeech (Interspeech), 2023
Jun Chen
Wei Rao
Zehao Wang
Jiuxin Lin
Yukai Ju
Shulin He
Yannan Wang
Zhiyong Wu
154
19
0
28 Jun 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed DataInterspeech (Interspeech), 2023
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
161
26
0
28 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
216
95
0
06 Jun 2023
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for
  Low-Resource TTS Adaptation
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS AdaptationInterspeech (Interspeech), 2023
Ambuj Mehrish
Abhinav Ramesh Kashyap
Yingting Li
Navonil Majumder
Soujanya Poria
160
13
0
29 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech
  Synthesis with Diffusion and Style-based Models
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based ModelsInterspeech (Interspeech), 2023
Minki Kang
Wooseok Han
Sung Ju Hwang
Eunho Yang
DiffM
161
27
0
23 May 2023
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource
  Scenarios
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource ScenariosInterspeech (Interspeech), 2023
Yuyue Wang
Huanhou Xiao
Yihan Wu
Ruihua Song
102
0
0
20 May 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
366
995
0
05 Jan 2023
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
Memories are One-to-Many Mapping Alleviators in Talking Face GenerationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Anni Tang
Tianyu He
Xuejiao Tan
Jun Ling
Liang Song
CVBM
285
27
0
09 Dec 2022
VideoDubber: Machine Translation with Speech-Aware Length Control for
  Video Dubbing
VideoDubber: Machine Translation with Speech-Aware Length Control for Video DubbingAAAI Conference on Artificial Intelligence (AAAI), 2022
Yihan Wu
Junliang Guo
Xuejiao Tan
Chen Zhang
Bohan Li
Ruihua Song
Lei He
Sheng Zhao
Arul Menezes
Jiang Bian
135
25
0
30 Nov 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Nobuyuki Morioka
Heiga Zen
Nanxin Chen
Yu Zhang
Yifan Ding
162
18
0
28 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
194
25
0
21 Oct 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech
  with Untranscribed Data
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim
Heeseung Kim
Sung-Hoon Yoon
DiffM
375
61
0
30 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level QualityIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
293
283
0
09 May 2022
1