Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

50 / 275 papers shown

Title
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 107 13 0 13 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 83 6 0 07 Nov 2022
Deliberation Networks and How to Train Them Qingyun Dou Mark Gales 63 0 0 06 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang Songxiang Liu Jianwei Yu Helin Wang Chao Weng Yuexian Zou DiffM VLM 85 18 0 04 Nov 2022
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts Detai Xin Sharath Adavanne F. Ang Ashish Kulkarni Shinnosuke Takamichi Hiroshi Saruwatari 103 14 0 04 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis Konstantinos Klapsas Karolos Nikitaras Nikolaos Ellinas June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 64 0 0 02 Nov 2022
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement Wei Song Ya Yue Ya-Jie Zhang Zhengchen Zhang Youzheng Wu Xiaodong He 92 4 0 02 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers Cheng-Ping Hsieh Subhankar Ghosh Boris Ginsburg 84 18 0 01 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 60 1 0 01 Nov 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents Yongmao Zhang Zhichao Wang Pei-Yin Yang Hongshen Sun Zhisheng Wang Linfu Xie 82 6 0 31 Oct 2022
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection L. Attorresi Davide Salvi Clara Borrelli Paolo Bestagini Stefano Tubaro 111 24 0 31 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion Jingyi Li Weiping Tu Li Xiao 134 113 0 27 Oct 2022
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge Yuhao Liang Pei-Ning Chen F. Yu Xinfa Zhu Tianyi Xu Linfu Xie 61 0 0 26 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 107 23 0 21 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 94 6 0 12 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 115 57 0 06 Oct 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 79 6 0 22 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech Saeed Ghorbani Ylva Ferstl Daniel Holden N. Troje M. Carbonneau 123 83 0 15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech Camille Noufi Lloyd May J. Berger 56 2 0 06 Sep 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks L. Finkelstein Heiga Zen Norman Casagrande Chun-an Chan Ye Jia ... Jonathan Shen V. Wan Yu Zhang Yonghui Wu R. Clark 55 9 0 28 Aug 2022
Speech Synthesis with Mixed Emotions Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 87 47 0 11 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset Xiang Li Changhe Song X. Wei Zhiyong Wu Jia Jia Helen Meng 64 4 0 10 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 92 4 0 03 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 67 5 0 29 Jul 2022
N-Grammer: Augmenting Transformers with latent n-grams Aurko Roy Rohan Anil Guangda Lai Benjamin Lee Jeffrey Zhao ... Yu Phuong Dao Christopher Fifty Zhiwen Chen Yonghui Wu 77 8 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 87 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 78 5 0 13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 100 14 0 13 Jul 2022
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems Hyun-Wook Yoon Ohsung Kwon Hoyeon Lee Ryuichi Yamamoto Eunwoo Song Jae-Min Kim Min-Jae Hwang 128 15 0 30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder Eunwoo Song Ryuichi Yamamoto Ohsung Kwon Chan Song Min-Jae Hwang Suhyeon Oh Hyun-Wook Yoon Jin-Seob Kim Jae-Min Kim 78 7 0 30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Weinan Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 108 27 0 29 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 80 15 0 27 Jun 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms Marco Jiralerspong Gauthier Gidel VLM 81 3 0 25 Jun 2022
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis Yihan Wu Xi Wang S. Zhang Lei He Ruihua Song J. Nie 102 15 0 25 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Kentaro Mitsui Tianyu Zhao Kei Sawada Yukiya Hono Yoshihiko Nankaku K. Tokuda 67 14 0 24 Jun 2022
Automatic Prosody Annotation with Pre-Trained Text-Speech Model Ziqian Dai Jianwei Yu Yan Wang Nuo Chen Yanyao Bian Guangzhi Li Deng Cai Dong Yu 417 8 0 16 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 112 40 0 30 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Rongjie Huang Yi Ren Jinglin Liu Chenye Cui Zhou Zhao OODD VLM 195 34 0 15 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality Xu Tan Jiawei Chen Haohe Liu Jian Cong Chen Zhang ... Lei He Frank Soong Tao Qin Sheng Zhao Tie-Yan Liu 141 221 0 09 May 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jaesung Bae Jinhyeok Yang Taejun Bak Young-Sun Joo DiffM 126 6 0 08 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022 Jiameng Gao 59 0 0 08 Apr 2022
Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder Juheon Lee Hyeong-Seok Choi Kyogu Lee 38 7 0 07 Apr 2022
Into-TTS : Intonation Template Based Prosody Control System Jihwan Lee Joun Yeop Lee Heejin Choi Seongkyu Mun Sangjun Park Jae-Sung Bae Chanwoo Kim 135 4 0 04 Apr 2022
On incorporating social speaker characteristics in synthetic speech S. Rallabandi Sebastian Möller 86 0 0 03 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios Yihan Wu Xu Tan Bohan Li Lei He Sheng Zhao Ruihua Song Tao Qin Tie-Yan Liu VLM DiffM 85 69 0 01 Apr 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent Yuki Saito Yuto Nishimura Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari 126 12 0 28 Mar 2022
vTTS: visual-text to speech Yoshifumi Nakano Takaaki Saeki Shinnosuke Takamichi Katsuhito Sudoh Hiroshi Saruwatari 61 4 0 28 Mar 2022
Attacker Attribution of Audio Deepfakes Nicolas Müller Franziska Dieckmann Jennifer Williams 60 15 0 28 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Shiyin Kang Helen Meng 52 12 0 23 Mar 2022
DGC-vector: A new speaker embedding for zero-shot voice conversion Ruitong Xiao Haitong Zhang Yue Lin 54 12 0 18 Mar 2022