ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.13662
  4. Cited By
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

31 January 2023
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
    DiffM
    VLM
ArXivPDFHTML

Papers citing "InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt"

22 / 72 papers shown
Title
VoiceLDM: Text-to-Speech with Environmental Context
VoiceLDM: Text-to-Speech with Environmental Context
Yeong-Won Lee
In-won Yeon
Juhan Nam
Joon Son Chung
VLM
DiffM
10
10
0
24 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics
  Description for Prompt-based Control
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
13
9
0
24 Sep 2023
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by
  Natural Language Prompts
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Jixun Yao
Yuguang Yang
Yinjiao Lei
Ziqian Ning
Yanni Hu
Y. Pan
Jingjing Yin
Hongbin Zhou
Heng Lu
Linfu Xie
DiffM
14
19
0
17 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech
  Using Natural Language Descriptions
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
11
19
0
15 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
31
40
0
05 Sep 2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language
  Text-to-Speech Models
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Ziyue Jiang
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
25
34
0
28 Aug 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
12
2
0
20 Jul 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
Daria Diatlova
V. Shutov
13
7
0
28 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with
  Contextual VQ-Diffusion and Vocoding
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
8
41
0
13 Jun 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural
  Language Descriptions
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Guanghou Liu
Yongmao Zhang
Yinjiao Lei
Yunlin Chen
Rui Wang
Zhifei Li
Linfu Xie
8
36
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
21
26
0
30 May 2023
Controllable Speaking Styles Using a Large Language Model
Controllable Speaking Styles Using a Large Language Model
A. Sigurgeirsson
Simon King
14
2
0
17 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio
  Codec
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
138
118
0
04 May 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
304
0
30 Jan 2023
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
29
18
0
04 Nov 2022
Improved Vector Quantized Diffusion Models
Improved Vector Quantized Diffusion Models
Zhicong Tang
Shuyang Gu
Jianmin Bao
Dong Chen
Fang Wen
DiffM
173
63
0
31 May 2022
Diffusion-LM Improves Controllable Text Generation
Diffusion-LM Improves Controllable Text Generation
Xiang Lisa Li
John Thickstun
Ishaan Gulrajani
Percy Liang
Tatsunori B. Hashimoto
AI4CE
171
768
0
27 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
115
34
0
15 May 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising
  Diffusion GANs
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
68
65
0
28 Jan 2022
Argmax Flows and Multinomial Diffusion: Learning Categorical
  Distributions
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Emiel Hoogeboom
Didrik Nielsen
P. Jaini
Patrick Forré
Max Welling
DiffM
202
392
0
10 Feb 2021
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Z. Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
201
817
0
12 Jun 2018
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
212
19,191
0
21 Nov 2016
Previous
12