ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.09017
  4. Cited By
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
ArXiv (abs)PDFHTML

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

25 / 275 papers shown
Title
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual
  Transfer Learning
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Tao Tu
Yuan-Jui Chen
Cheng-chieh Yeh
Hung-yi Lee
93
88
0
13 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia
Ron J. Weiss
Fadi Biadsy
Wolfgang Macherey
Melvin Johnson
Zhiwen Chen
Yonghui Wu
101
230
0
12 Apr 2019
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Haohan Guo
Frank Soong
Lei He
Lei Xie
71
30
0
09 Apr 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
164
959
0
05 Apr 2019
In Other News: A Bi-style Text-to-speech Model for Synthesizing
  Newscaster Voice with Limited Data
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data
N. Prateek
Mateusz Lajszczak
Roberto Barra-Chicote
Thomas Drugman
Jaime Lorenzo-Trueba
Thomas Merritt
S. Ronanki
Trevor Wood
87
30
0
04 Apr 2019
Multi-reference Tacotron by Intercross Training for Style
  Disentangling,Transfer and Control in Speech Synthesis
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
77
46
0
04 Apr 2019
Visualization and Interpretation of Latent Spaces for Controlling
  Expressive Speech Synthesis through Audio Analysis
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis
Noé Tits
Fengna Wang
Kevin El Haddad
Vincent Pagel
Thierry Dutoit
DiffM
88
39
0
27 Mar 2019
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick
  Enrolling New Speaker and Enhancing Premium Voice
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice
Yan Deng
Lei He
Frank Soong
114
29
0
13 Dec 2018
Learning latent representations for style control and transfer in
  end-to-end speech synthesis
Learning latent representations for style control and transfer in end-to-end speech synthesis
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDLSSLDRL
97
229
0
11 Dec 2018
Robust and fine-grained prosody control of end-to-end speech synthesis
Robust and fine-grained prosody control of end-to-end speech synthesis
Younggun Lee
Jonathan Le Roux
88
147
0
06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
98
163
0
05 Nov 2018
Investigating context features hidden in End-to-End TTS
Investigating context features hidden in End-to-End TTS
Kohki Mametani
T. Kato
Seiichi Yamamoto
46
9
0
04 Nov 2018
Training Neural Speech Recognition Systems with Synthetic Speech
  Augmentation
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation
Jason Chun Lok Li
R. Gadde
Boris Ginsburg
Vitaly Lavrukhin
63
55
0
02 Nov 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis
Hierarchical Generative Modeling for Controllable Speech Synthesis
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
...
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
93
276
0
16 Oct 2018
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
Azam Rabiee
Geonmin Kim
Tae-Ho Kim
Soo-Young Lee
18
1
0
12 Oct 2018
Semi-Supervised Training for Improving Data Efficiency in End-to-End
  Speech Synthesis
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An Chung
Yuxuan Wang
Wei-Ning Hsu
Yu Zhang
RJ Skerry-Ryan
87
117
0
30 Aug 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech
  Synthesis
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan
73
122
0
04 Aug 2018
Wasserstein GAN and Waveform Loss-based Acoustic Model Training for
  Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder
Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder
Yi Zhao
Shinji Takaki
Hieu-Thi Luong
Junichi Yamagishi
Daisuke Saito
Nobuaki Minematsu
92
64
0
31 Jul 2018
Scaling and bias codes for modeling speaker-adaptive DNN-based speech
  synthesis systems
Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems
Hieu-Thi Luong
Junichi Yamagishi
117
7
0
31 Jul 2018
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable
  Speech Synthesis
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis
G. Henter
Jaime Lorenzo-Trueba
Xin Wang
Junichi Yamagishi
DRLSSL
88
61
0
30 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
270
838
0
12 Jun 2018
Disentangling by Partitioning: A Representation Learning Framework for
  Multimodal Sensory Data
Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data
Wei-Ning Hsu
James R. Glass
DRL
79
43
0
29 May 2018
Automatic Documentation of ICD Codes with Far-Field Speech Recognition
Automatic Documentation of ICD Codes with Far-Field Speech Recognition
Albert Haque
Corinna Fukushima
23
0
0
30 Apr 2018
Conditional End-to-End Audio Transforms
Conditional End-to-End Audio Transforms
Albert Haque
Michelle Guo
Prateek Verma
114
41
0
30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
  Tacotron
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
56
555
0
24 Mar 2018
Previous
123456