Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

50 / 275 papers shown

Title
A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing Richard He Bai Renjie Zheng Junkun Chen Xintong Li Mingbo Ma Liang Huang 119 53 0 18 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis Pengyu Cheng Zhenhua Ling 72 3 0 02 Mar 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen KELM 69 20 0 21 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge Jiangyan Yi Ruibo Fu J. Tao Shuai Nie Haoxin Ma ... Le Xu Zhengqi Wen Haizhou Li Zheng Lian Bin Liu 77 185 0 17 Feb 2022
Cross-speaker style transfer for text-to-speech using data augmentation M. Ribeiro Julian Roth Giulia Comini Goeric Huybrechts Adam Gabry's Jaime Lorenzo-Trueba 66 21 0 10 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 155 18 0 24 Jan 2022
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end Rem Hida Masaki Hamada Chie Kamada E. Tsunoo Toshiyuki Sekiya Toshiyuki Kumakura 34 7 0 24 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 109 103 0 19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 79 75 0 17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 172 58 0 10 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion Wendong Gan Bolong Wen Yin Yan Haitao Chen Zhichao Wang Hongqiang Du Lei Xie Kaixuan Guo Hai Li 85 14 0 02 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 86 11 0 23 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 81 27 0 25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 60 11 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 62 4 0 19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 72 18 0 19 Nov 2021
Speaker Generation Daisy Stanton Matt Shannon Soroosh Mariooryad RJ Skerry-Ryan Eric Battenberg Tom Bagby David Kao 96 30 0 07 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 63 17 0 07 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning Shijun Wang Dimche Kostadinov Damian Borth 86 11 0 27 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 Yanqing Liu Rui Shao G. Wang Kuan Chen Bohan Li Pong C. Yuen Jinzhu Li Lei He Sheng Zhao 89 55 0 25 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition Ting-Yao Hu Mohammadreza Armandpour A. Shrivastava Jen-Hao Rick Chang H. Koppula Oncel Tuzel SyDa 87 42 0 21 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation Fengyu Yang Jian Luan Yujun Wang 137 5 0 19 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research Tomoki Hayashi Ryuichi Yamamoto Takenori Yoshimura Peter Wu Jiatong Shi Takaaki Saeki Yooncheol Ju Yusuke Yasuda Shinnosuke Takamichi Shinji Watanabe VLM 85 63 0 15 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 169 31 0 12 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 62 16 0 08 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 78 23 0 06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 111 16 0 06 Oct 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS Shilu Lin Wenchao Su Li Meng Fenglong Xie Xinhui Li Li Lu 121 4 0 28 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 72 3 0 22 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit Changhan Wang Wei-Ning Hsu Yossi Adi Adam Polyak Ann Lee Peng-Jen Chen Jiatao Gu J. Pino VLM 106 32 0 14 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 67 47 0 14 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis Songxiang Liu Shan Yang Jane Polak Scowcroft Dong Yu AI4TS 62 10 0 08 Sep 2021
A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations Mingzhi Yu Diane Litman Shuang Ma Jian Wu 123 1 0 04 Sep 2021
Enhancing audio quality for expressive Neural Text-to-Speech Abdelhamid Ezzerg Adam Gabry's Bartosz Putrycz Daniel Korzekwa Daniel Sáez-Trigueros David McHardy Kamil Pokora Jakub Lachowicz Jaime Lorenzo-Trueba V. Klimkov 132 6 0 13 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis Julian Zaïdi Hugo Seuté Benjamin van Niekerk M. Carbonneau 61 21 0 04 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 92 23 0 27 Jul 2021
On Prosody Modeling for ASR+TTS based Voice Conversion Wen-Chin Huang Tomoki Hayashi Xinjian Li Shinji Watanabe Tomoki Toda 73 9 0 20 Jul 2021
Generative Pretraining for Paraphrase Evaluation J. Weston R. Lenain U. Meepegama E. Fristed AIMat 59 10 0 17 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 68 17 0 17 Jul 2021
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information Qinghua Wu Quanbo Shen Jian Luan YuJun Wang 72 4 0 07 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance Hieu-Thi Luong Junichi Yamagishi 85 0 0 25 Jun 2021
Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn? Nicolas Müller Franziska Dieckmann Pavel Czempin Roman Canals Konstantin Böttinger Jennifer Williams 106 71 0 23 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 77 3 0 21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Jane Polak Scowcroft 73 30 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 119 9 0 18 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Chenye Cui Yi Ren Jinglin Liu Feiyang Chen Rongjie Huang Ming Lei Zhou Zhao 66 35 0 17 Jun 2021
Global Rhythm Style Transfer Without Text Transcriptions Kaizhi Qian Yang Zhang Shiyu Chang Jinjun Xiong Chuang Gan David D. Cox M. Hasegawa-Johnson 78 20 0 16 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 55 17 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 74 7 0 14 Jun 2021