ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.02625
  4. Cited By
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
v1v2 (latest)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector

IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
4 November 2024
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
ArXiv (abs)PDFHTML

Papers citing "EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector"

50 / 51 papers shown
Title
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Yizhou Peng
Yukun Ma
C. Zhang
Yi-Wen Chao
Chongjia Ni
B. Ma
77
0
0
15 Oct 2025
EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS
EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS
Haoxun Li
Yu Liu
Yuqing Sun
Hanlei Shi
Leyuan Qu
Taihao Li
60
0
0
07 Oct 2025
LibriTTS-VI: A Public Corpus and Novel Methods for Efficient Voice Impression Control
LibriTTS-VI: A Public Corpus and Novel Methods for Efficient Voice Impression Control
Junki Ohmura
Yuki Ito
E. Tsunoo
Toshiyuki Sekiya
Toshiyuki Kumakura
87
0
0
19 Sep 2025
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Siyi Zhou
Yiquan Zhou
Yi He
Xun Zhou
Jinchao Wang
Wei Deng
Jingchen Shu
DiffM
163
14
0
23 Jun 2025
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-SpeechInterspeech (Interspeech), 2025
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
169
0
0
26 May 2025
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling
Cheng Yifan
Zhang Ruoyi
Shi Jiatong
141
1
0
21 May 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
Guanrou Yang
Chen Yang
Qian Chen
Ziyang Ma
Wenxi Chen
...
Fan Yu
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
474
21
0
17 Apr 2025
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow MatchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Jun-Hak Yun
Seung-Bin Kim
Seong-Whan Lee
DiffM
125
7
0
10 Jan 2025
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
240
10
0
25 Sep 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of
  Flow-Matching-Based Zero-Shot Text-to-Speech
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
230
18
0
17 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
263
138
0
26 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical
  Emotion Vector for Controllable Emotional Text-to-Speech
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
193
30
0
12 Jun 2024
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Hierarchical Emotion Prediction and Control in Text-to-Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Sho Inoue
Kun Zhou
Shuai Wang
Haizhou Li
167
12
0
15 May 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text AlignmentIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
533
1
0
16 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDESLRSSL
268
225
0
23 Dec 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Matcha-TTS: A fast TTS architecture with conditional flow matchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
260
171
0
06 Sep 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
238
26
0
31 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
265
417
0
23 Jun 2023
Disentangled Variational Autoencoder for Emotion Recognition in
  Conversations
Disentangled Variational Autoencoder for Emotion Recognition in ConversationsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Kailai Yang
Tianlin Zhang
Sophia Ananiadou
DRL
263
16
0
23 May 2023
Cluster-Level Contrastive Learning for Emotion Recognition in
  Conversations
Cluster-Level Contrastive Learning for Emotion Recognition in ConversationsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Kailai Yang
Tianlin Zhang
Hassan Alhuzali
Sophia Ananiadou
196
60
0
07 Feb 2023
Flow Matching for Generative Modeling
Flow Matching for Generative ModelingInternational Conference on Learning Representations (ICLR), 2022
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
1.0K
2,757
0
06 Oct 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
296
61
0
11 Aug 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for
  End-to-End Speech Synthesis
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech SynthesisInterspeech (Interspeech), 2022
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
211
18
0
04 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for
  Speech Synthesis based on Disentanglement between Prosody and Timbre
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and TimbreIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Guangyan Zhang
Ying Qin
Weinan Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
291
34
0
29 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingInternational Conference on Learning Representations (ICLR), 2022
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
279
376
0
09 Jun 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice
  Conversion
An Overview & Analysis of Sequence-to-Sequence Emotional Voice ConversionInterspeech (Interspeech), 2022
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
181
17
0
29 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the
  valence gap
Dawn of the transformer era in speech emotion recognition: closing the valence gapIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
365
401
0
14 Mar 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for
  emotional speech synthesis
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesisIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
195
93
0
17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Emotion Intensity and its Control for Emotional Voice ConversionIEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
332
74
0
10 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
577
540
0
04 Dec 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
178
20
0
07 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
989
2,597
0
26 Oct 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
161
62
0
14 Sep 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech
  Synthesis
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech SynthesisInterspeech (Interspeech), 2021
Shifeng Pan
Lei He
184
25
0
27 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
504
3,946
0
14 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD
Emotional Voice Conversion: Theory, Databases and ESDSpeech Communication (Speech Commun.), 2021
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
356
241
0
31 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
335
651
0
13 May 2021
Orthogonal Projection Loss
Orthogonal Projection LossIEEE International Conference on Computer Vision (ICCV), 2021
Kanchana Ranasinghe
Muzammal Naseer
Munawar Hayat
Salman Khan
Fahad Shahbaz Khan
VLM
182
87
0
25 Mar 2021
Fine-grained Emotion Strength Transfer, Control and Prediction for
  Emotional Speech Synthesis
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech SynthesisSpoken Language Technology Workshop (SLT), 2020
Yinjiao Lei
Shan Yang
Lei Xie
164
61
0
17 Nov 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.7K
7,304
0
20 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
248
572
0
22 May 2020
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Emotional Voice Conversion using Multitask Learning with Text-to-speechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Tae-Ho Kim
Sungjae Cho
Shinkook Choi
Sejik Park
Soo-Young Lee
209
43
0
11 Nov 2019
Emotional speech synthesis with rich and granularized control
Emotional speech synthesis with rich and granularized controlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
287
97
0
05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
166
160
0
26 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Semi-Supervised Generative Modeling for Controllable Speech SynthesisInternational Conference on Learning Representations (ICLR), 2019
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
162
48
0
03 Oct 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
303
1,180
0
05 Apr 2019
MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal
  and Proximal Labels
MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels
Zhongzhe Xiao
Ying-Cong Chen
W. Dou
Zhi Tao
Liming Chen
131
9
0
30 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
591
903
0
12 Jun 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
273
881
0
23 Mar 2018
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDLSSLOCL
619
6,272
0
02 Nov 2017
12
Next