PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions

31 May 2023

Papers citing "PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions"

31 / 31 papers shown

Title
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation Stephen Brade Sam Anderson Rithesh Kumar Zeyu Jin Anh Truong 31 0 0 07 Apr 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models Kim Sung-Bin Jeongsoo Choi Puyuan Peng Joon Son Chung Tae-Hyun Oh David F. Harwath VGen 45 1 0 03 Apr 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets Anuj Diwan Zhisheng Zheng David F. Harwath Eunsol Choi CLIP VLM 75 0 0 06 Mar 2025
Gender Bias in Instruction-Guided Speech Synthesis Models Chun-Yi Kuan Hung-yi Lee 60 0 0 08 Feb 2025
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control Ryuichi Yamamoto Yuma Shirahata Masaya Kawamura Kentaro Tachibana DiffM 27 2 0 26 Sep 2024
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Zhiyong Chen Xinnuo Li Zhiqi Ai Shugong Xu DiffM 34 1 0 24 Sep 2024
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models Xin Jing Kun Zhou Andreas Triantafyllopoulos Björn W. Schuller DiffM 27 3 0 10 Sep 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling Yixuan Zhou Xiaoyu Qin Zeyu Jin Shuoyi Zhou Shun Lei Songtao Zhou Zhiyong Wu Jia Jia AuLLM 18 5 0 28 Aug 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization Yuchen Hu Chen Chen Siyin Wang Eng Siong Chng C. Zhang 43 3 0 02 Jul 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems Zhengyang Chen Xuechen Liu Erica Cooper Junichi Yamagishi Yanmin Qian 38 2 0 13 Jun 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning Masaya Kawamura Ryuichi Yamamoto Yuma Shirahata Takuya Hasumi Kentaro Tachibana VLM 22 5 0 12 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Sang-Hoon Lee Seong-Whan Lee 29 6 0 12 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts Thomas Bott Florian Lux Ngoc Thang Vu 31 6 0 10 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec Shengpeng Ji Jia-li Zuo Minghui Fang Siqi Zheng Qian Chen ... Ziyue Jiang Hai Huang Xize Cheng Rongjie Huang Zhou Zhao 45 7 0 03 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback Chen Chen Yuchen Hu Wen Wu Helin Wang Chng Eng Siong Chao Zhang 33 10 0 02 Jun 2024
Voice Attribute Editing with Text Prompt Zheng-Yan Sheng Yang Ai Li-Juan Liu Jia Pan Zhenhua Ling 26 4 0 13 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild Puyuan Peng Po-Yao (Bernie) Huang Daniel Li Abdelrahman Mohamed David F. Harwath 60 55 0 25 Mar 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Daniel Lyth Simon King 16 35 0 02 Feb 2024
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators Wiebke Hutiri Orestis Papakyriakopoulos Alice Xiang 13 15 0 25 Jan 2024
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Wenhao Guan Yishuang Li Tao Li Hukai Huang Feng Wang Jiayan Lin Lingyan Huang Lin Li Q. Hong 23 8 0 17 Dec 2023
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction Zhaoxi Mu Xinyu Yang Sining Sun Qing Yang SSL 10 8 0 16 Dec 2023
PromptSpeaker: Speaker Generation Based on Text Descriptions Yongmao Zhang Guanghou Liu Yinjiao Lei Yunlin Chen Hao Yin Lei Xie Zhifei Li 11 11 0 08 Oct 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion Chun-Yi Kuan Chen An Li Tsung-Yuan Hsu T. Lin Ho-Lam Chung Kai-Wei Chang Shuo-yiin Chang Hung-yi Lee 13 5 0 25 Sep 2023
VoiceLDM: Text-to-Speech with Environmental Context Yeong-Won Lee In-won Yeon Juhan Nam Joon Son Chung VLM DiffM 13 10 0 24 Sep 2023
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts Jixun Yao Yuguang Yang Yinjiao Lei Ziqian Ning Yanni Hu Y. Pan Jingjing Yin Hongbin Zhou Heng Lu Linfu Xie DiffM 16 19 0 17 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions Reo Shimizu Ryuichi Yamamoto Masaya Kawamura Yuma Shirahata Hironori Doi Tatsuya Komatsu Kentaro Tachibana DiffM 13 19 0 15 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt Yichong Leng Zhifang Guo Kai Shen Xu Tan Zeqian Ju ... Lei He Xiang-Yang Li Sheng Zhao Tao Qin Jiang Bian VLM DiffM 34 40 0 05 Sep 2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models Shengpeng Ji Jia-li Zuo Minghui Fang Ziyue Jiang Feiyang Chen Xinyu Duan Baoxing Huai Zhou Zhao 25 34 0 28 Aug 2023
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Ming Jiang Linfu Xie 19 15 0 04 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin W. Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 37 26 0 29 Jun 2022
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Z. Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 201 819 0 12 Jun 2018