Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.09017
Cited By
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
23 March 2018
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
50 / 275 papers shown
Title
A
3
^3
3
T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Richard He Bai
Renjie Zheng
Junkun Chen
Xintong Li
Mingbo Ma
Liang Huang
119
53
0
18 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
72
3
0
02 Mar 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
KELM
69
20
0
21 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge
Jiangyan Yi
Ruibo Fu
J. Tao
Shuai Nie
Haoxin Ma
...
Le Xu
Zhengqi Wen
Haizhou Li
Zheng Lian
Bin Liu
77
185
0
17 Feb 2022
Cross-speaker style transfer for text-to-speech using data augmentation
M. Ribeiro
Julian Roth
Giulia Comini
Goeric Huybrechts
Adam Gabry's
Jaime Lorenzo-Trueba
66
21
0
10 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
155
18
0
24 Jan 2022
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Rem Hida
Masaki Hamada
Chie Kamada
E. Tsunoo
Toshiyuki Sekiya
Toshiyuki Kumakura
34
7
0
24 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
109
103
0
19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
79
75
0
17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
172
58
0
10 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion
Wendong Gan
Bolong Wen
Yin Yan
Haitao Chen
Zhichao Wang
Hongqiang Du
Lei Xie
Kaixuan Guo
Hai Li
85
14
0
02 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios
Qicong Xie
Tao Li
Xinsheng Wang
Zhichao Wang
Lei Xie
Guoqiao Yu
Guanglu Wan
86
11
0
23 Dec 2021
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
81
27
0
25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
60
11
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
62
4
0
19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
72
18
0
19 Nov 2021
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
96
30
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
63
17
0
07 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
86
11
0
27 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
Pong C. Yuen
Jinzhu Li
Lei He
Sheng Zhao
89
55
0
25 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Ting-Yao Hu
Mohammadreza Armandpour
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
SyDa
87
42
0
21 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Fengyu Yang
Jian Luan
Yujun Wang
137
5
0
19 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
85
63
0
15 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Li-Wei Chen
Alexander I. Rudnicky
169
31
0
12 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
62
16
0
08 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
78
23
0
06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
111
16
0
06 Oct 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Shilu Lin
Wenchao Su
Li Meng
Fenglong Xie
Xinhui Li
Li Lu
121
4
0
28 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
72
3
0
22 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
Changhan Wang
Wei-Ning Hsu
Yossi Adi
Adam Polyak
Ann Lee
Peng-Jen Chen
Jiatao Gu
J. Pino
VLM
106
32
0
14 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
67
47
0
14 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Songxiang Liu
Shan Yang
Jane Polak Scowcroft
Dong Yu
AI4TS
62
10
0
08 Sep 2021
A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations
Mingzhi Yu
Diane Litman
Shuang Ma
Jian Wu
123
1
0
04 Sep 2021
Enhancing audio quality for expressive Neural Text-to-Speech
Abdelhamid Ezzerg
Adam Gabry's
Bartosz Putrycz
Daniel Korzekwa
Daniel Sáez-Trigueros
David McHardy
Kamil Pokora
Jakub Lachowicz
Jaime Lorenzo-Trueba
V. Klimkov
132
6
0
13 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
61
21
0
04 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
92
23
0
27 Jul 2021
On Prosody Modeling for ASR+TTS based Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Xinjian Li
Shinji Watanabe
Tomoki Toda
73
9
0
20 Jul 2021
Generative Pretraining for Paraphrase Evaluation
J. Weston
R. Lenain
U. Meepegama
E. Fristed
AIMat
59
10
0
17 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
68
17
0
17 Jul 2021
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information
Qinghua Wu
Quanbo Shen
Jian Luan
YuJun Wang
72
4
0
07 Jul 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance
Hieu-Thi Luong
Junichi Yamagishi
85
0
0
25 Jun 2021
Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?
Nicolas Müller
Franziska Dieckmann
Pavel Czempin
Roman Canals
Konstantin Böttinger
Jennifer Williams
106
71
0
23 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
77
3
0
21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Jane Polak Scowcroft
73
30
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
119
9
0
18 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
66
35
0
17 Jun 2021
Global Rhythm Style Transfer Without Text Transcriptions
Kaizhi Qian
Yang Zhang
Shiyu Chang
Jinjun Xiong
Chuang Gan
David D. Cox
M. Hasegawa-Johnson
78
20
0
16 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
D. Mohan
Qinmin Hu
Tian Huey Teh
Alexandra Torresquintero
C. Wallis
Marlene Staib
Lorenzo Foglianti
Jiameng Gao
Simon King
55
17
0
15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
74
7
0
14 Jun 2021
Previous
1
2
3
4
5
6
Next