ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00436
  4. Cited By
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

1 April 2022
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
    VLM
    DiffM
ArXivPDFHTML

Papers citing "AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios"

42 / 42 papers shown
Title
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
93
0
0
21 Feb 2025
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Qianniu Chen
Xiaoyang Hao
B. Li
Y. Liu
Li Lu
39
0
0
15 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
33
2
0
30 Aug 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
24
2
0
31 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
39
3
0
03 Jul 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
29
4
0
18 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
31
6
0
10 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
36
3
0
05 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
28
5
0
04 Jun 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
32
10
0
28 Apr 2024
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir D. Memon
35
0
0
28 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
33
72
0
12 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
25
5
0
18 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
24
7
0
05 Jan 2024
Self-Supervised Disentangled Representation Learning for Robust Target
  Speech Extraction
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
Sining Sun
Qing Yang
SSL
18
8
0
16 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
25
12
0
05 Dec 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual
  Deepfakes
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
43
4
0
29 Nov 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
15
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
19
6
0
26 Oct 2023
Large-Scale Automatic Audiobook Creation
Large-Scale Automatic Audiobook Creation
Brendan Walsh
Mark Hamilton
Greg Newby
Xi Wang
Serena Ruan
...
Lei He
Shaofei Zhang
Eric Dettinger
William T. Freeman
Markus Weimer
23
1
0
07 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
39
40
0
05 Sep 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
21
1
0
28 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with
  Disentangled Representations
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wen Wang
Yang Song
S. Jha
29
8
0
24 Aug 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
33
44
0
14 Jul 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
Daria Diatlova
V. Shutov
26
7
0
28 Jun 2023
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale
  Interfusion and Conditional Speaker Modulation
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation
Jun Chen
Wei Rao
Z. Wang
Jiuxin Lin
Yukai Ju
Shulin He
Yannan Wang
Zhiyong Wu
9
9
0
28 Jun 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
19
21
0
28 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
32
73
0
06 Jun 2023
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for
  Low-Resource TTS Adaptation
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
Ambuj Mehrish
Abhinav Ramesh Kashyap
Yingting Li
Navonil Majumder
Soujanya Poria
11
7
0
29 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech
  Synthesis with Diffusion and Style-based Models
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
S. Hwang
Eunho Yang
DiffM
23
15
0
23 May 2023
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource
  Scenarios
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios
Yuyue Wang
Huanhou Xiao
Yihan Wu
Ruihua Song
21
0
0
20 May 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
43
639
0
05 Jan 2023
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
Anni Tang
Tianyu He
Xuejiao Tan
Jun Ling
Liang Song
CVBM
26
23
0
09 Dec 2022
VideoDubber: Machine Translation with Speech-Aware Length Control for
  Video Dubbing
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
Yihan Wu
Junliang Guo
Xuejiao Tan
Chen Zhang
Bohan Li
Ruihua Song
Lei He
Sheng Zhao
Arul Menezes
Jiang Bian
24
15
0
30 Nov 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Nobuyuki Morioka
Heiga Zen
Nanxin Chen
Yu Zhang
Yifan Ding
29
16
0
28 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
30
22
0
21 Oct 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech
  with Untranscribed Data
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim
Heeseung Kim
Sung-Hoon Yoon
DiffM
196
52
0
30 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
35
211
0
09 May 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
176
378
0
04 Dec 2021
1