ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.04342
  4. Cited By
Learning latent representations for style control and transfer in
  end-to-end speech synthesis
v1v2 (latest)

Learning latent representations for style control and transfer in end-to-end speech synthesis

11 December 2018
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
    BDLSSLDRL
ArXiv (abs)PDFHTML

Papers citing "Learning latent representations for style control and transfer in end-to-end speech synthesis"

50 / 119 papers shown
Title
Model See Model Do: Speech-Driven Facial Animation with Style Control
Model See Model Do: Speech-Driven Facial Animation with Style Control
Yifang Pan
Karan Singh
Luiz Gustavo Hafemann
DiffM
85
0
0
02 May 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
121
6
0
26 Dec 2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen
Run Chen
Julia Hirschberg
82
3
0
01 Oct 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
78
1
0
20 Aug 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
89
8
0
07 Jun 2024
RSET: Remapping-based Sorting Method for Emotion Transfer Speech
  Synthesis
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Haoxiang Shi
Jianzong Wang
Xulong Zhang
Ning Cheng
Jun Yu
Jing Xiao
73
2
0
27 May 2024
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
85
14
0
17 Dec 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human
  Annotations
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
Hanglei Zhang
Yiwei Guo
Sen Liu
Xie Chen
Kai Yu
55
1
0
02 Nov 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling
  for Zero-Shot Voice Cloning
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
77
4
0
06 Oct 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
93
3
0
06 Oct 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice
  Conversion by Multi-scale Style Modeling
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
114
4
0
03 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
85
9
0
02 Sep 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive
  Text-to-Speech Synthesis
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
62
1
0
30 Aug 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
87
1
0
30 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with
  Disentangled Representations
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wen Wang
Yang Song
S. Jha
81
8
0
24 Aug 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context
  Information for Expressive Speech Synthesis
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
87
7
0
29 Jul 2023
Backdoor Attacks against Voice Recognition Systems: A Survey
Backdoor Attacks against Voice Recognition Systems: A Survey
Baochen Yan
Jiahe Lan
Zheng Yan
AAML
80
12
0
23 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
79
2
0
20 Jul 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and
  Diffusion Bridge
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
87
6
0
07 Jun 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural
  Language Descriptions
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Guanghou Liu
Yongmao Zhang
Yinjiao Lei
Yunlin Chen
Rui Wang
Zhifei Li
Linfu Xie
70
42
0
31 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
59
0
0
12 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
73
1
0
09 May 2023
DSVAE: Interpretable Disentangled Representation for Synthetic Speech
  Detection
DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection
Amit Kumar Singh Yadav
Kratika Bhagtani
Ziyue Xiang
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
DRL
68
6
0
06 Apr 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised
  Style Extractor and Hierarchical Modeling in Speech Synthesis
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Ying Zhang
Xiaorui Wang
Zhong-ming Wang
77
9
0
14 Mar 2023
Do Prosody Transfer Models Transfer Prosody?
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
65
8
0
07 Mar 2023
On granularity of prosodic representations in expressive text-to-speech
On granularity of prosodic representations in expressive text-to-speech
Mikolaj Babianski
Kamil Pokora
Raahil Shah
Rafał Sienkiewicz
Daniel Korzekwa
V. Klimkov
66
6
0
26 Jan 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for
  Expressive Speech-to-Speech Translation
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
81
16
0
25 Jan 2023
Computational Charisma -- A Brick by Brick Blueprint for Building
  Charismatic Artificial Intelligence
Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence
Björn W. Schuller
Shahin Amiriparian
A. Batliner
Alexander Gebhard
Maurice Gerczuk
Vincent Karas
Alexander Kathan
Lennart Seizer
Johanna Löchner
179
4
0
31 Dec 2022
Emotion Selectable End-to-End Text-based Speech Editing
Emotion Selectable End-to-End Text-based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
Chu Yuan Zhang
76
2
0
20 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
85
10
0
14 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and
  Speaker-wise Normalization in Speech Synthesis
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
71
6
0
13 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
66
7
0
29 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors
  Decoupling
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Xinfa Zhu
Yinjiao Lei
Kun Song
Yongmao Zhang
Tao Li
Linfu Xie
75
17
0
19 Nov 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
Ya-Jie Zhang
Wei Song
Ya Yue
Zhengchen Zhang
Youzheng Wu
Xiaodong He
64
7
0
11 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
83
6
0
07 Nov 2022
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style
  Disentanglement
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement
Wei Song
Ya Yue
Ya-Jie Zhang
Zhengchen Zhang
Youzheng Wu
Xiaodong He
92
4
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
60
1
0
01 Nov 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker
  TTS with Accents
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents
Yongmao Zhang
Zhichao Wang
Pei-Yin Yang
Hongshen Sun
Zhisheng Wang
Linfu Xie
82
6
0
31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
50
0
0
28 Oct 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical
  Approach For Accent Transfer Tasks
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
L. Finkelstein
Heiga Zen
Norman Casagrande
Chun-an Chan
Ye Jia
...
Jonathan Shen
V. Wan
Yu Zhang
Yonghui Wu
R. Clark
55
9
0
28 Aug 2022
Pathway to Future Symbiotic Creativity
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
66
0
0
18 Aug 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
87
47
0
11 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech
  Synthesis
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
92
4
0
03 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational
  text-to-speech via F0-conditioned data augmentation
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Giulia Comini
Goeric Huybrechts
M. Ribeiro
Adam Gabry's
Jaime Lorenzo-Trueba
67
5
0
29 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
87
10
0
13 Jul 2022
PoeticTTS -- Controllable Poetry Reading for Literary Studies
PoeticTTS -- Controllable Poetry Reading for Literary Studies
Julia Koch
Florian Lux
Nadja Schauffler
T. Bernhart
Felix Dieterle
Jonas Kuhn
Sandra Richter
Gabriel Viehhauser
Ngoc Thang Vu
66
5
0
11 Jul 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for
  End-to-End Speech Synthesis
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
101
16
0
04 Jul 2022
Language Model-Based Emotion Prediction Methods for Emotional Speech
  Synthesis Systems
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Hyun-Wook Yoon
Ohsung Kwon
Hoyeon Lee
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
Min-Jae Hwang
128
15
0
30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis
  using ranking support vector machine with variational autoencoder
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song
Ryuichi Yamamoto
Ohsung Kwon
Chan Song
Min-Jae Hwang
Suhyeon Oh
Hyun-Wook Yoon
Jin-Seob Kim
Jae-Min Kim
78
7
0
30 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many
  Fine-Grained Prosody Transfer
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
S. Karlapati
Penny Karanasou
Mateusz Lajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Raymond Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
80
15
0
27 Jun 2022
123
Next