Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

50 / 275 papers shown

Title
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Tomás Nekvinda Ondrej Dusek 72 57 0 03 Aug 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis Fengyu Yang Shan Yang Qinghua Wu Yujun Wang Lei Xie 73 5 0 03 Aug 2020
Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning Jaesung Bae Hanbin Bae Young-Sun Joo Junmo Lee Gyeong-Hoon Lee Hoon-Young Cho 73 17 0 30 Jul 2020
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling Hao Hao Tan Dorien Herremans MGen 60 74 0 29 Jul 2020
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach Chaitanya Ahuja Dong Won Lee Y. Nakano Louis-Philippe Morency 51 106 0 24 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis Antti Suni Sofoklis Kakouros M. Vainio J. Šimko 68 18 0 29 Jun 2020
Neural voice cloning with a few low-quality samples Sunghee Jung Hoi-Rim Kim 37 3 0 12 Jun 2020
Deep generative models for musical audio synthesis M. Huzaifah L. Wyse 210 20 0 10 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 130 498 0 22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 41 5 0 21 May 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis Yusuke Yasuda Xin Wang Junichi Yamagishi AI4TS 76 31 0 20 May 2020
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech Wenjie Li Benlai Tang Xiang Yin Yushi Zhao Wei Li Kang Wang Hao Huang Yuxuan Wang Zejun Ma 70 13 0 19 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 91 67 0 18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation A. Laptev Roman Korostik A. Svischev A. Andrusenko Ivan Medennikov S. Rybin 81 61 0 14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 96 121 0 12 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint Zexin Cai Chuxiong Zhang Ming Li 73 42 0 10 May 2020
Jukebox: A Generative Model for Music Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford Ilya Sutskever VLM 171 758 0 30 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis Ting-Yao Hu A. Shrivastava Oncel Tuzel C. Dhir 57 32 0 09 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Jing Xiao 52 21 0 04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 56 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 98 93 0 06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems Nick Rossenbach Albert Zeyer Ralf Schluter Hermann Ney 95 84 0 19 Dec 2019
Singing Synthesis: with a little help from my attention Orazio Angelini Alexis Moinet K. Yanagisawa Thomas Drugman 61 17 0 12 Dec 2019
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis Junjie Pan Xiang Yin Zhiling Zhang Shichao Liu Yang Zhang Zejun Ma Yuxuan Wang 47 27 0 11 Nov 2019
Emotional speech synthesis with rich and granularized control Seyun Um Sangshin Oh Kyungguen Byun Inseon Jang C. Ahn Hong-Goo Kang 80 90 0 05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 82 149 0 26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency M. Whitehill Shuang Ma Daniel J. McDuff Yale Song 111 35 0 25 Oct 2019
Towards Fine-Grained Prosody Control for Voice Conversion Zheng Lian Zhengqi Wen 70 19 0 24 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit Tomoki Hayashi Ryuichi Yamamoto Katsuki Inoue Takenori Yoshimura Shinji Watanabe Tomoki Toda K. Takeda Yu Zhang Xu Tan VLM 93 205 0 24 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 68 48 0 03 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training Qingyun Dou Yiting Lu Joshua Efiong Mark Gales 62 6 0 26 Sep 2019
Speech Recognition with Augmented Synthesized Speech Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Ye Jia Pedro J. Moreno Yonghui Wu Zelin Wu 69 128 0 25 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis Chengzhu Yu Heng Lu Na Hu Meng Yu Chao Weng ... Deyi Tuo Shiyin Kang Guangzhi Lei Jane Polak Scowcroft Dong Yu CVBM 85 118 0 04 Sep 2019
Maximizing Mutual Information for Tacotron Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Jane Polak Scowcroft Dong Yu 86 16 0 30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck Shuang Ma Daniel J. McDuff Yale Song 89 25 0 19 Aug 2019
Adversarially Trained End-to-end Korean Singing Voice Synthesis System Juheon Lee Hyeong-Seok Choi Chang-Bin Jeon Junghyun Koo Kyogu Lee 84 78 0 06 Aug 2019
Forward-Backward Decoding for Regularizing End-to-End TTS Yibin Zheng Xi Wang Lei He Shifeng Pan Frank Soong Zhengqi Wen J. Tao 48 13 0 18 Jul 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu Zhiwen Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg Bhuvana Ramabhadran 76 189 0 09 Jul 2019
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention Shuang Ma Daniel J. McDuff Yale Song 39 4 0 09 Jul 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach Noé Tits 40 10 0 05 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural text-to-speech V. Klimkov S. Ronanki Jonas Rohnke Thomas Drugman AI4TS 89 82 0 04 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training Peng Wu Zhenhua Ling Li-Juan Liu Yuan Jiang Hong-Chuan Wu Lirong Dai 95 72 0 26 Jun 2019
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders Yin-Jyun Luo Kat R. Agres Dorien Herremans 103 46 0 19 Jun 2019
Using generative modelling to produce varied intonation for speech synthesis Zack Hodari O. Watts Simon King 67 29 0 10 Jun 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis Eric Battenberg Soroosh Mariooryad Daisy Stanton RJ Skerry-Ryan Matt Shannon David Kao Tom Bagby BDL 104 45 0 08 Jun 2019
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems Ohsung Kwon Eunwoo Song Jae-Min Kim Hong-Goo Kang 48 4 0 21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network V. Wan Chun-an Chan Tom Kenter Jakub Vít R. Clark 71 75 0 17 May 2019
Learning to Groove with Inverse Sequence Transformations Jon Gillick Adam Roberts Jesse Engel Douglas Eck David Bamman SLR BDL 77 81 0 14 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition Yi Ren Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 95 102 0 13 May 2019
Incorporating Symbolic Sequential Modeling for Speech Enhancement Chien-Feng Liao Yu Tsao Xugang Lu Hisashi Kawai 50 18 0 30 Apr 2019