v1v2 (latest)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector

IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024

4 November 2024

Papers citing "EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector"

50 / 51 papers shown

Title
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models Yizhou Peng Yukun Ma C. Zhang Yi-Wen Chao Chongjia Ni B. Ma 57 0 0 15 Oct 2025
EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS Haoxun Li Yu Liu Yuqing Sun Hanlei Shi Leyuan Qu Taihao Li 56 0 0 07 Oct 2025
LibriTTS-VI: A Public Corpus and Novel Methods for Efficient Voice Impression Control Junki Ohmura Yuki Ito E. Tsunoo Toshiyuki Sekiya Toshiyuki Kumakura 71 0 0 19 Sep 2025
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech Siyi Zhou Yiquan Zhou Yi He Xun Zhou Jinchao Wang Wei Deng Jingchen Shu DiffM 127 9 0 23 Jun 2025
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-SpeechInterspeech (Interspeech), 2025 Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Seong-Whan Lee 153 0 0 26 May 2025
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling Cheng Yifan Zhang Ruoyi Shi Jiatong 125 1 0 21 May 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting Guanrou Yang Chen Yang Qian Chen Ziyang Ma Wenxi Chen ... Fan Yu Zhihao Du Zhifu Gao Shiliang Zhang Xie Chen AuLLM 442 20 0 17 Apr 2025
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow MatchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 Jun-Hak Yun Seung-Bin Kim Seong-Whan Lee DiffM 97 7 0 10 Jan 2025
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions Kun Zhou You Zhang Shengkui Zhao Hao Wang Zexu Pan ... Chongjia Ni Yukun Ma Trung Hieu Nguyen J. Yip Bin Ma 220 10 0 25 Sep 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Haibin Wu Xiaofei Wang Sefik Emre Eskimez Manthan Thakker Daniel Tompkins ... Canrun Li Zhen Xiao Sheng Zhao Jinyu Li Naoyuki Kanda 214 18 0 17 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Sefik Emre Eskimez Xiaofei Wang Manthan Thakker Canrun Li Chung-Hsien Tsai ... Min Tang Xu Tan Yanqing Liu Sheng Zhao Naoyuki Kanda VLM 201 134 0 26 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Sang-Hoon Lee Seong-Whan Lee 177 29 0 12 Jun 2024
Hierarchical Emotion Prediction and Control in Text-to-Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Sho Inoue Kun Zhou Shuai Wang Haizhou Li 139 11 0 15 May 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text AlignmentIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024 Hyoung-Seok Oh Sang-Hoon Lee Deok-Hyun Cho Seong-Whan Lee 501 1 0 16 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation Ziyang Ma Zhisheng Zheng Jiaxin Ye Jinchao Li Zhifu Gao Shiliang Zhang Xie Chen MDE SLR SSL 232 218 0 23 Dec 2023
Matcha-TTS: A fast TTS architecture with conditional flow matchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Shivam Mehta Ruibo Tu Jonas Beskow Éva Székely G. Henter 236 167 0 06 Sep 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 H. Oh Sang-Hoon Lee Seong-Whan Lee DiffM 234 26 0 31 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023 Matt Le Apoorv Vyas Bowen Shi Brian Karrer Leda Sari ... Mary Williamson Vimal Manohar Yossi Adi Jay Mahadeokar Wei-Ning Hsu AuLLM 237 413 0 23 Jun 2023
Disentangled Variational Autoencoder for Emotion Recognition in ConversationsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023 Kailai Yang Tianlin Zhang Sophia Ananiadou DRL 235 16 0 23 May 2023
Cluster-Level Contrastive Learning for Emotion Recognition in ConversationsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023 Kailai Yang Tianlin Zhang Hassan Alhuzali Sophia Ananiadou 164 59 0 07 Feb 2023
Flow Matching for Generative ModelingInternational Conference on Learning Representations (ICLR), 2022 Y. Lipman Ricky T. Q. Chen Heli Ben-Hamu Maximilian Nickel Matt Le OOD 807 2,611 0 06 Oct 2022
Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022 Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 280 60 0 11 Aug 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech SynthesisInterspeech (Interspeech), 2022 Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Ming Jiang Linfu Xie 203 17 0 04 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and TimbreIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Guangyan Zhang Ying Qin Weinan Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 287 33 0 29 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingInternational Conference on Learning Representations (ICLR), 2022 Sang-gil Lee Ming-Yu Liu Boris Ginsburg Bryan Catanzaro Sung-Hoon Yoon 259 367 0 09 Jun 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice ConversionInterspeech (Interspeech), 2022 Zijiang Yang Xin Jing Andreas Triantafyllopoulos Meishu Song Ilhan Aslan Björn W. Schuller 169 17 0 29 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the valence gapIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 Johannes Wagner Andreas Triantafyllopoulos H. Wierstorf Maximilian Schmitt Felix Burkhardt F. Eyben Björn W. Schuller 291 401 0 14 Mar 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesisIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 163 93 0 17 Jan 2022
Emotion Intensity and its Control for Emotional Voice ConversionIEEE Transactions on Affective Computing (IEEE TAC), 2022 Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 328 73 0 10 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021 Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 561 535 0 04 Dec 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 146 20 0 07 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Sanyuan Chen Chengyi Wang Zhengyang Chen Yu-Huan Wu Shujie Liu ... Yao Qian Jian Wu Micheal Zeng Xiangzhan Yu Furu Wei SSL 735 2,554 0 26 Oct 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 153 62 0 14 Sep 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech SynthesisInterspeech (Interspeech), 2021 Shifeng Pan Lei He 180 24 0 27 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021 Wei-Ning Hsu Benjamin Bolte Yifan Hao Kushal Lakhotia Ruslan Salakhutdinov Abdel-rahman Mohamed SSL 472 3,879 0 14 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESDSpeech Communication (Speech Commun.), 2021 Kun Zhou Berrak Sisman Rui Liu Haizhou Li 308 237 0 31 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021 Vadim Popov Ivan Vovk Vladimir Gogoryan Tasnima Sadekova Mikhail Kudinov DiffM 286 648 0 13 May 2021
Orthogonal Projection LossIEEE International Conference on Computer Vision (ICCV), 2021 Kanchana Ranasinghe Muzammal Naseer Munawar Hayat Salman Khan Fahad Shahbaz Khan VLM 130 84 0 25 Mar 2021
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech SynthesisSpoken Language Technology Workshop (SLT), 2020 Yinjiao Lei Shan Yang Lei Xie 144 61 0 17 Nov 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 1.1K 7,195 0 20 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 232 565 0 22 May 2020
Emotional Voice Conversion using Multitask Learning with Text-to-speechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019 Tae-Ho Kim Sungjae Cho Shinkook Choi Sejik Park Soo-Young Lee 205 43 0 11 Nov 2019
Emotional speech synthesis with rich and granularized controlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019 Seyun Um Sangshin Oh Kyungguen Byun Inseon Jang C. Ahn Hong-Goo Kang 254 96 0 05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019 Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 150 160 0 26 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech SynthesisInternational Conference on Learning Representations (ICLR), 2019 Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 130 48 0 03 Oct 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech Heiga Zen Viet Dang R. Clark Yu Zhang Ron J. Weiss Ye Jia Zhiwen Chen Yonghui Wu 295 1,173 0 05 Apr 2019
MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels Zhongzhe Xiao Ying-Cong Chen W. Dou Zhi Tao Liming Chen 91 9 0 30 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Zhiwen Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 559 900 0 12 Jun 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 253 880 0 23 Mar 2018
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 563 6,195 0 02 Nov 2017