ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.09017
  4. Cited By
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
ArXiv (abs)PDFHTML

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

50 / 275 papers shown
Title
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Tomás Nekvinda
Ondrej Dusek
72
57
0
03 Aug 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech
  Synthesis
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Fengyu Yang
Shan Yang
Qinghua Wu
Yujun Wang
Lei Xie
73
5
0
03 Aug 2020
Speaking Speed Control of End-to-End Speech Synthesis using
  Sentence-Level Conditioning
Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jaesung Bae
Hanbin Bae
Young-Sun Joo
Junmo Lee
Gyeong-Hoon Lee
Hoon-Young Cho
73
17
0
30 Jul 2020
Music FaderNets: Controllable Music Generation Based On High-Level
  Features via Low-Level Feature Modelling
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling
Hao Hao Tan
Dorien Herremans
MGen
60
74
0
29 Jul 2020
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker
  Conditional-Mixture Approach
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach
Chaitanya Ahuja
Dong Won Lee
Y. Nakano
Louis-Philippe Morency
51
106
0
24 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech
  Synthesis
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis
Antti Suni
Sofoklis Kakouros
M. Vainio
J. Šimko
68
18
0
29 Jun 2020
Neural voice cloning with a few low-quality samples
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
37
3
0
12 Jun 2020
Deep generative models for musical audio synthesis
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
210
20
0
10 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
130
498
0
22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
41
5
0
21 May 2020
Investigation of learning abilities on linguistic features in
  sequence-to-sequence text-to-speech synthesis
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
AI4TS
76
31
0
20 May 2020
Improving Accent Conversion with Reference Encoder and End-To-End
  Text-To-Speech
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech
Wenjie Li
Benlai Tang
Xiang Yin
Yushi Zhao
Wei Li
Kang Wang
Hao Huang
Yuxuan Wang
Zejun Ma
70
13
0
19 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
91
67
0
18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
81
61
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
96
121
0
12 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep
  Transfer with Feedback Constraint
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
73
42
0
10 May 2020
Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
171
758
0
30 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual
  Information for Speech Synthesis
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
57
32
0
09 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech
GraphTTS: graph-to-sequence modelling in neural text-to-speech
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Jing Xiao
52
21
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
56
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
98
93
0
06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
95
84
0
19 Dec 2019
Singing Synthesis: with a little help from my attention
Singing Synthesis: with a little help from my attention
Orazio Angelini
Alexis Moinet
K. Yanagisawa
Thomas Drugman
61
17
0
12 Dec 2019
A unified sequence-to-sequence front-end model for Mandarin
  text-to-speech synthesis
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Junjie Pan
Xiang Yin
Zhiling Zhang
Shichao Liu
Yang Zhang
Zejun Ma
Yuxuan Wang
47
27
0
11 Nov 2019
Emotional speech synthesis with rich and granularized control
Emotional speech synthesis with rich and granularized control
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
80
90
0
05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
82
149
0
26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle
  Consistency
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
M. Whitehill
Shuang Ma
Daniel J. McDuff
Yale Song
111
35
0
25 Oct 2019
Towards Fine-Grained Prosody Control for Voice Conversion
Towards Fine-Grained Prosody Control for Voice Conversion
Zheng Lian
Zhengqi Wen
70
19
0
24 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
  End-to-End Text-to-Speech Toolkit
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
Tomoki Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
93
205
0
24 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
68
48
0
03 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training
Attention Forcing for Sequence-to-sequence Model Training
Qingyun Dou
Yiting Lu
Joshua Efiong
Mark Gales
62
6
0
26 Sep 2019
Speech Recognition with Augmented Synthesized Speech
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
69
128
0
25 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Chengzhu Yu
Heng Lu
Na Hu
Meng Yu
Chao Weng
...
Deyi Tuo
Shiyin Kang
Guangzhi Lei
Jane Polak Scowcroft
Dong Yu
CVBM
85
118
0
04 Sep 2019
Maximizing Mutual Information for Tacotron
Maximizing Mutual Information for Tacotron
Peng Liu
Xixin Wu
Shiyin Kang
Guangzhi Li
Jane Polak Scowcroft
Dong Yu
86
16
0
30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
89
25
0
19 Aug 2019
Adversarially Trained End-to-end Korean Singing Voice Synthesis System
Adversarially Trained End-to-end Korean Singing Voice Synthesis System
Juheon Lee
Hyeong-Seok Choi
Chang-Bin Jeon
Junghyun Koo
Kyogu Lee
84
78
0
06 Aug 2019
Forward-Backward Decoding for Regularizing End-to-End TTS
Forward-Backward Decoding for Regularizing End-to-End TTS
Yibin Zheng
Xi Wang
Lei He
Shifeng Pan
Frank Soong
Zhengqi Wen
J. Tao
48
13
0
18 Jul 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech
  Synthesis and Cross-Language Voice Cloning
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Zhiwen Chen
RJ Skerry-Ryan
Ye Jia
Andrew Rosenberg
Bhuvana Ramabhadran
76
189
0
09 Jul 2019
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
Shuang Ma
Daniel J. McDuff
Yale Song
39
4
0
09 Jul 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic
  Speech -- a Deep Learning approach
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
40
10
0
05 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural
  text-to-speech
Fine-grained robust prosody transfer for single-speaker neural text-to-speech
V. Klimkov
S. Ronanki
Jonas Rohnke
Thomas Drugman
AI4TS
89
82
0
04 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and
  Semi-Supervised Training
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
Peng Wu
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Hong-Chuan Wu
Lirong Dai
95
72
0
26 Jun 2019
Learning Disentangled Representations of Timbre and Pitch for Musical
  Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Yin-Jyun Luo
Kat R. Agres
Dorien Herremans
103
46
0
19 Jun 2019
Using generative modelling to produce varied intonation for speech
  synthesis
Using generative modelling to produce varied intonation for speech synthesis
Zack Hodari
O. Watts
Simon King
67
29
0
10 Jun 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End
  Speech Synthesis
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis
Eric Battenberg
Soroosh Mariooryad
Daisy Stanton
RJ Skerry-Ryan
Matt Shannon
David Kao
Tom Bagby
BDL
104
45
0
08 Jun 2019
Effective parameter estimation methods for an ExcitNet model in
  generative text-to-speech systems
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems
Ohsung Kwon
Eunwoo Song
Jae-Min Kim
Hong-Goo Kang
48
4
0
21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven
  Dynamic Hierarchical Conditional Variational Network
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
71
75
0
17 May 2019
Learning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence Transformations
Jon Gillick
Adam Roberts
Jesse Engel
Douglas Eck
David Bamman
SLRBDL
77
81
0
14 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Yi Ren
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
95
102
0
13 May 2019
Incorporating Symbolic Sequential Modeling for Speech Enhancement
Incorporating Symbolic Sequential Modeling for Speech Enhancement
Chien-Feng Liao
Yu Tsao
Xugang Lu
Hisashi Kawai
50
18
0
30 Apr 2019
Previous
123456
Next