Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.04342
Cited By
v1
v2 (latest)
Learning latent representations for style control and transfer in end-to-end speech synthesis
11 December 2018
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDL
SSL
DRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning latent representations for style control and transfer in end-to-end speech synthesis"
50 / 119 papers shown
Title
Model See Model Do: Speech-Driven Facial Animation with Style Control
Yifang Pan
Karan Singh
Luiz Gustavo Hafemann
DiffM
85
0
0
02 May 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
121
6
0
26 Dec 2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen
Run Chen
Julia Hirschberg
82
3
0
01 Oct 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
78
1
0
20 Aug 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
89
8
0
07 Jun 2024
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Haoxiang Shi
Jianzong Wang
Xulong Zhang
Ning Cheng
Jun Yu
Jing Xiao
73
2
0
27 May 2024
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
85
14
0
17 Dec 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
Hanglei Zhang
Yiwei Guo
Sen Liu
Xie Chen
Kai Yu
55
1
0
02 Nov 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
77
4
0
06 Oct 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
93
3
0
06 Oct 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
114
4
0
03 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
85
9
0
02 Sep 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
62
1
0
30 Aug 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
87
1
0
30 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wen Wang
Yang Song
S. Jha
81
8
0
24 Aug 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
87
7
0
29 Jul 2023
Backdoor Attacks against Voice Recognition Systems: A Survey
Baochen Yan
Jiahe Lan
Zheng Yan
AAML
80
12
0
23 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
79
2
0
20 Jul 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
87
6
0
07 Jun 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Guanghou Liu
Yongmao Zhang
Yinjiao Lei
Yunlin Chen
Rui Wang
Zhifei Li
Linfu Xie
70
42
0
31 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
59
0
0
12 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
73
1
0
09 May 2023
DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection
Amit Kumar Singh Yadav
Kratika Bhagtani
Ziyue Xiang
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
DRL
68
6
0
06 Apr 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Ying Zhang
Xiaorui Wang
Zhong-ming Wang
77
9
0
14 Mar 2023
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
65
8
0
07 Mar 2023
On granularity of prosodic representations in expressive text-to-speech
Mikolaj Babianski
Kamil Pokora
Raahil Shah
Rafał Sienkiewicz
Daniel Korzekwa
V. Klimkov
66
6
0
26 Jan 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
81
16
0
25 Jan 2023
Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence
Björn W. Schuller
Shahin Amiriparian
A. Batliner
Alexander Gebhard
Maurice Gerczuk
Vincent Karas
Alexander Kathan
Lennart Seizer
Johanna Löchner
179
4
0
31 Dec 2022
Emotion Selectable End-to-End Text-based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
Chu Yuan Zhang
76
2
0
20 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
85
10
0
14 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
71
6
0
13 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
66
7
0
29 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Xinfa Zhu
Yinjiao Lei
Kun Song
Yongmao Zhang
Tao Li
Linfu Xie
75
17
0
19 Nov 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
Ya-Jie Zhang
Wei Song
Ya Yue
Zhengchen Zhang
Youzheng Wu
Xiaodong He
64
7
0
11 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
83
6
0
07 Nov 2022
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement
Wei Song
Ya Yue
Ya-Jie Zhang
Zhengchen Zhang
Youzheng Wu
Xiaodong He
92
4
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
60
1
0
01 Nov 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents
Yongmao Zhang
Zhichao Wang
Pei-Yin Yang
Hongshen Sun
Zhisheng Wang
Linfu Xie
82
6
0
31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
50
0
0
28 Oct 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
L. Finkelstein
Heiga Zen
Norman Casagrande
Chun-an Chan
Ye Jia
...
Jonathan Shen
V. Wan
Yu Zhang
Yonghui Wu
R. Clark
55
9
0
28 Aug 2022
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
66
0
0
18 Aug 2022
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
87
47
0
11 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
92
4
0
03 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Giulia Comini
Goeric Huybrechts
M. Ribeiro
Adam Gabry's
Jaime Lorenzo-Trueba
67
5
0
29 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
87
10
0
13 Jul 2022
PoeticTTS -- Controllable Poetry Reading for Literary Studies
Julia Koch
Florian Lux
Nadja Schauffler
T. Bernhart
Felix Dieterle
Jonas Kuhn
Sandra Richter
Gabriel Viehhauser
Ngoc Thang Vu
66
5
0
11 Jul 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
101
16
0
04 Jul 2022
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Hyun-Wook Yoon
Ohsung Kwon
Hoyeon Lee
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
Min-Jae Hwang
128
15
0
30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song
Ryuichi Yamamoto
Ohsung Kwon
Chan Song
Min-Jae Hwang
Suhyeon Oh
Hyun-Wook Yoon
Jin-Seob Kim
Jae-Min Kim
78
7
0
30 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
S. Karlapati
Penny Karanasou
Mateusz Lajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Raymond Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
80
15
0
27 Jun 2022
1
2
3
Next