Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.09017
Cited By
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
23 March 2018
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
50 / 275 papers shown
Title
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
Helen Meng
Chao Weng
Jane Polak Scowcroft
93
24
0
11 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS
Liping Chen
Yan Deng
Xi Wang
Frank Soong
Lei He
89
23
0
08 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning
Rayhane Mama
Marc S. Tyndel
Hashiam Kadhim
Cole Clifford
Ragavan Thurairatnam
VGen
112
12
0
08 Jun 2021
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios
E. Tsunoo
Kentarou Shibata
Chaitanya Narisetty
Yosuke Kashiwagi
Shinji Watanabe
69
12
0
07 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
134
175
0
06 Jun 2021
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis
Chenpeng Du
K. Yu
154
20
0
27 May 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Shakti Kumar
Jithin Pradeep
Hussain Zaidi
DRL
68
6
0
10 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
59
12
0
05 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
94
25
0
20 Apr 2021
Spectrogram Inpainting for Interactive Generation of Instrument Sounds
Théis Bazin
Gaëtan Hadjeres
P. Esling
M. Malt
61
11
0
15 Apr 2021
Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels
Clément Le Moine Veillon
Nicolas Obin
Axel Roebel
35
8
0
15 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
103
6
0
14 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
72
12
0
12 Apr 2021
Half-Truth: A Partially Fake Audio Detection Dataset
Jiangyan Yi
Ye Bai
J. Tao
Haoxin Ma
Zhengkun Tian
Chenglong Wang
Tao Wang
Ruibo Fu
71
85
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
Helen Meng
64
47
0
08 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
Rui Liu
Berrak Sisman
Haizhou Li
69
32
0
03 Apr 2021
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark Gales
60
7
0
02 Apr 2021
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-Wook Kim
Seung-won Park
Junhyeok Lee
Myun-chul Joe
76
28
0
02 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
103
84
0
28 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
69
32
0
17 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
119
70
0
06 Mar 2021
Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Jie Wang
Jingbei Li
Xintao Zhao
Zhiyong Wu
Shiyin Kang
Helen Meng
DRL
123
29
0
30 Jan 2021
Expressive Neural Voice Cloning
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
59
30
0
30 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
90
67
0
31 Dec 2020
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
108
38
0
23 Dec 2020
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
55
16
0
23 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
68
5
0
14 Dec 2020
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis
Anurag Chowdhury
Arun Ross
Prabu David
28
5
0
09 Dec 2020
Using previous acoustic context to improve Text-to-Speech synthesis
Pilar Oplustil Gallegos
Simon King
70
11
0
07 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
62
9
0
03 Dec 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
79
74
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
88
56
0
17 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
91
36
0
12 Nov 2020
Spoken Language Interaction with Robots: Research Issues and Recommendations, Report from the NSF Future Directions Workshop
M. Marge
C. Espy-Wilson
Roger K. Moore
77
79
0
11 Nov 2020
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
Haoyu Li
Yang Ai
Junichi Yamagishi
76
2
0
10 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Erica Cooper
Xin Wang
Yi Zhao
Yusuke Yasuda
Junichi Yamagishi
SyDa
50
3
0
10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
116
21
0
08 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Guanghui Xu
Wei Song
Zhengchen Zhang
Chao Zhang
Xiaodong He
Bowen Zhou
62
50
0
06 Nov 2020
Data Augmentation for End-to-end Code-switching Speech Recognition
Chenpeng Du
Hao Li
Yizhou Lu
Lan Wang
Y. Qian
57
28
0
04 Nov 2020
Speech Synthesis and Control Using Differentiable DSP
Giorgio Fabbro
Vladimir Golkov
Thomas Kemp
Zorah Lähner
78
12
0
28 Oct 2020
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
105
192
0
28 Oct 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition
Xiong Cai
Dongyang Dai
Zhiyong Wu
Xiang Li
Jingbei Li
Helen Meng
94
67
0
26 Oct 2020
Unsupervised Learning of Disentangled Speech Content and Style Representation
Andros Tjandra
Ruoming Pang
Yu Zhang
Shigeki Karita
BDL
DRL
73
15
0
24 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
112
223
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
76
103
0
22 Oct 2020
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Yukiya Hono
Kazuna Tsuboi
Kei Sawada
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
BDL
57
24
0
17 Sep 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
78
66
0
14 Sep 2020
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS
Rui Liu
Berrak Sisman
F. Bao
Guanglai Gao
Haizhou Li
41
18
0
11 Aug 2020
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling
Yeunju Choi
Youngmoon Jung
Hoirin Kim
139
26
0
09 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
112
73
0
04 Aug 2020
Previous
1
2
3
4
5
6
Next