ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.13662
  4. Cited By
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

31 January 2023
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
    DiffM
    VLM
ArXivPDFHTML

Papers citing "InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt"

50 / 72 papers shown
Title
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
Guanrou Yang
Chen Yang
Qian Chen
Ziyang Ma
Wenxi Chen
...
Fan Yu
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
53
0
0
17 Apr 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
45
0
0
14 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
H. Meng
37
0
0
14 Apr 2025
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
Dan Luo
Chengyuan Ma
Weiqin Li
Jun Wang
Wei Chen
Zhiyong Wu
26
0
0
14 Apr 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin
Jeongsoo Choi
Puyuan Peng
Joon Son Chung
Tae-Hyun Oh
David F. Harwath
VGen
45
1
0
03 Apr 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets
Anuj Diwan
Zhisheng Zheng
David F. Harwath
Eunsol Choi
CLIP
VLM
75
0
0
06 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
44
0
0
01 Mar 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Y. Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
64
0
0
26 Feb 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Wenxiang Guo
Yu Zhang
Changhao Pan
Rongjie Huang
Li Tang
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Zhou Zhao
91
2
0
18 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Yuexian Zou
83
2
0
10 Feb 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
Gender Bias in Instruction-Guided Speech Synthesis Models
Chun-Yi Kuan
Hung-yi Lee
56
0
0
08 Feb 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Tian-Hao Zhang
Jiawei Zhang
J. Wang
Xinyuan Qian
Xu-cheng Yin
CVBM
45
0
0
02 Jan 2025
Deepfake Media Generation and Detection in the Generative AI Era: A
  Survey and Outlook
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
F. Khan
Mubarak Shah
79
2
0
29 Nov 2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen
Run Chen
Julia Hirschberg
19
3
0
01 Oct 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice
  Control
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
24
2
0
26 Sep 2024
Emo-DPO: Controllable Emotional Speech Synthesis through Direct
  Preference Optimization
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Xiaoxue Gao
Chen Zhang
Yiming Chen
Huayun Zhang
Nancy F. Chen
26
6
0
16 Sep 2024
Enhancing Emotional Text-to-Speech Controllability with Natural Language
  Guidance through Contrastive Learning and Diffusion Models
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
Xin Jing
Kun Zhou
Andreas Triantafyllopoulos
Björn W. Schuller
DiffM
27
3
0
10 Sep 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
29
52
0
29 Aug 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with
  Unified Multilingual Codec Language Modelling
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
18
5
0
28 Aug 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
19
10
0
24 Aug 2024
PRESENT: Zero-Shot Text-to-Prosody Control
PRESENT: Zero-Shot Text-to-Prosody Control
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
Dorien Herremans
33
0
0
13 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
19
4
0
12 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
24
4
0
21 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
43
3
0
02 Jul 2024
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Zehua Kcriss Li
Meiying Melissa Chen
Yi Zhong
Pinxin Liu
Zhiyao Duan
21
0
0
15 Jun 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained
  Multi-Speaker Text-to-Speech Systems
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Zhengyang Chen
Xuechen Liu
Erica Cooper
Junichi Yamagishi
Yanmin Qian
30
2
0
13 Jun 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts
  for Text-to-Speech and Style Captioning
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Masaya Kawamura
Ryuichi Yamamoto
Yuma Shirahata
Takuya Hasumi
Kentaro Tachibana
VLM
18
1
0
12 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
23
6
0
10 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar
  Latent Transformer Diffusion Models
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
54
24
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
45
7
0
03 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
33
10
0
02 Jun 2024
Exploring speech style spaces with language models: Emotional TTS
  without emotion labels
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra
Zongyang Du
Berrak Sisman
21
2
0
18 May 2024
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Xinlei Niu
Jing Zhang
Charles Patrick Martin
16
0
0
24 Apr 2024
Voice Attribute Editing with Text Prompt
Voice Attribute Editing with Text Prompt
Zheng-Yan Sheng
Yang Ai
Li-Juan Liu
Jia Pan
Zhenhua Ling
21
4
0
13 Apr 2024
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled
  Representation Learning based Adaptive Feature-aware Prompt Encoders
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
Yu Pan
Lei Ma
Jianjun Zhao
24
4
0
03 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David F. Harwath
57
55
0
25 Mar 2024
Building speech corpus with diverse voice characteristics for its
  prompt-based representation
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
14
0
0
20 Mar 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
28
9
0
18 Mar 2024
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot
  Text-to-Speech with Model and Data Scaling
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
36
5
0
09 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
22
139
0
05 Mar 2024
Natural language guidance of high-fidelity text-to-speech with synthetic
  annotations
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
6
35
0
02 Feb 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
13
7
0
05 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
26
37
0
25 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
21
8
0
17 Dec 2023
OpenVoice: Versatile Instant Voice Cloning
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
19
18
0
03 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
17
31
0
21 Nov 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human
  Annotations
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
Hanglei Zhang
Yiwei Guo
Sen Liu
Xie Chen
Kai Yu
15
0
0
02 Nov 2023
PromptSpeaker: Speaker Generation Based on Text Descriptions
PromptSpeaker: Speaker Generation Based on Text Descriptions
Yongmao Zhang
Guanghou Liu
Yinjiao Lei
Yunlin Chen
Hao Yin
Lei Xie
Zhifei Li
11
8
0
08 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
20
114
0
01 Oct 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen An Li
Tsung-Yuan Hsu
T. Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
13
5
0
25 Sep 2023
12
Next