v1v2 (latest)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector

IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024

4 November 2024

Papers citing "EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector"

50 / 51 papers shown

Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models

15 Oct 2025

EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

07 Oct 2025

LibriTTS-VI: A Public Corpus and Novel Methods for Efficient Voice Impression Control

104

19 Sep 2025

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

197

23 Jun 2025

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-SpeechInterspeech (Interspeech), 2025

180

26 May 2025

MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling

Cheng Yifan

Zhang Ruoyi

Shi Jiatong

164

21 May 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

...

504

17 Apr 2025

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow MatchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

147

10 Jan 2025

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Kun Zhou

You Zhang

Hao Wang

...

Bin Ma

264

25 Sep 2024

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

...

252

17 Jul 2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Sefik Emre Eskimez

Xiaofei Wang

Manthan Thakker

Canrun Li

Chung-Hsien Tsai

...

Min Tang

Xu Tan

Yanqing Liu

Sheng Zhao

Naoyuki Kanda

VLM

275

143

26 Jun 2024

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

Deok-Hyeon Cho

Hyung-Seok Oh

Seung-Bin Kim

Sang-Hoon Lee

Seong-Whan Lee

199

12 Jun 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Sho Inoue

Kun Zhou

Shuai Wang

Haizhou Li

172

15 May 2024

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text AlignmentIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024

576

16 Jan 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Xie Chen

301

235

23 Dec 2023

Matcha-TTS: A fast TTS architecture with conditional flow matchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

304

175

06 Sep 2023

DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

256

31 Jul 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023

...

Yossi Adi

297

425

23 Jun 2023

Disentangled Variational Autoencoder for Emotion Recognition in ConversationsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023

276

23 May 2023

Cluster-Level Contrastive Learning for Emotion Recognition in ConversationsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023

209

07 Feb 2023

Flow Matching for Generative ModelingInternational Conference on Learning Representations (ICLR), 2022

1.1K

2,869

06 Oct 2022

Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022

Haizhou Li

316

11 Aug 2022

Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech SynthesisInterspeech (Interspeech), 2022

216

04 Jul 2022

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and TimbreIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

293

29 Jun 2022

BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingInternational Conference on Learning Representations (ICLR), 2022

Boris Ginsburg

307

379

09 Jun 2022

An Overview & Analysis of Sequence-to-Sequence Emotional Voice ConversionInterspeech (Interspeech), 2022

Zijiang Yang

Xin Jing

Andreas Triantafyllopoulos

Meishu Song

Ilhan Aslan

Björn W. Schuller

197

29 Mar 2022

Dawn of the transformer era in speech emotion recognition: closing the valence gapIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Johannes Wagner

Andreas Triantafyllopoulos

Björn W. Schuller

382

406

14 Mar 2022

MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesisIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Yinjiao Lei

Shan Yang

Xinsheng Wang

Lei Xie

195

17 Jan 2022

Emotion Intensity and its Control for Emotional Voice ConversionIEEE Transactions on Affective Computing (IEEE TAC), 2022

Kun Zhou

Berrak Sisman

R. Rana

Björn W. Schuller

Haizhou Li

342

10 Jan 2022

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021

Edresson Casanova

Julian Weber

C. Shulby

Arnaldo Cândido Júnior

Eren Golge

M. Ponti

673

547

04 Dec 2021

Emotional Prosody Control for Speech Generation

S. Sivaprasad

Saiteja Kosgi

Vineet Gandhi

178

07 Nov 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

...

Jian Wu

1.1K

2,642

26 Oct 2021

Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis

192

14 Sep 2021

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech SynthesisInterspeech (Interspeech), 2021

Shifeng Pan

Lei He

196

27 Jul 2021

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021

540

3,993

14 Jun 2021

Emotional Voice Conversion: Theory, Databases and ESDSpeech Communication (Speech Commun.), 2021

Kun Zhou

Berrak Sisman

Rui Liu

Haizhou Li

382

243

31 May 2021

Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021

391

660

13 May 2021

Orthogonal Projection LossIEEE International Conference on Computer Vision (ICCV), 2021

Salman Khan

206

25 Mar 2021

Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech SynthesisSpoken Language Technology Workshop (SLT), 2020

Yinjiao Lei

Shan Yang

Lei Xie

188

17 Nov 2020

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

2.4K

7,351

20 Jun 2020

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

266

576

22 May 2020

Emotional Voice Conversion using Multitask Learning with Text-to-speechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

217

11 Nov 2019

Emotional speech synthesis with rich and granularized controlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

290

05 Nov 2019

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

175

161

26 Oct 2019

Semi-Supervised Generative Modeling for Controllable Speech SynthesisInternational Conference on Learning Representations (ICLR), 2019

162

03 Oct 2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

322

1,187

05 Apr 2019

MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels

131

30 Aug 2018

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

...

665

911

12 Jun 2018

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Yuxuan Wang

Rif A. Saurous

283

886

23 Mar 2018

Neural Discrete Representation Learning

649

6,365

02 Nov 2017