Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2411.02625
Cited By
v1
v2 (latest)
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
4 November 2024
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector"
50 / 51 papers shown
Title
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Yizhou Peng
Yukun Ma
C. Zhang
Yi-Wen Chao
Chongjia Ni
B. Ma
57
0
0
15 Oct 2025
EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS
Haoxun Li
Yu Liu
Yuqing Sun
Hanlei Shi
Leyuan Qu
Taihao Li
56
0
0
07 Oct 2025
LibriTTS-VI: A Public Corpus and Novel Methods for Efficient Voice Impression Control
Junki Ohmura
Yuki Ito
E. Tsunoo
Toshiyuki Sekiya
Toshiyuki Kumakura
71
0
0
19 Sep 2025
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Siyi Zhou
Yiquan Zhou
Yi He
Xun Zhou
Jinchao Wang
Wei Deng
Jingchen Shu
DiffM
127
9
0
23 Jun 2025
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
Interspeech (Interspeech), 2025
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
153
0
0
26 May 2025
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling
Cheng Yifan
Zhang Ruoyi
Shi Jiatong
125
1
0
21 May 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
Guanrou Yang
Chen Yang
Qian Chen
Ziyang Ma
Wenxi Chen
...
Fan Yu
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
442
20
0
17 Apr 2025
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Jun-Hak Yun
Seung-Bin Kim
Seong-Whan Lee
DiffM
97
7
0
10 Jan 2025
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
220
10
0
25 Sep 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
214
18
0
17 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
201
134
0
26 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
177
29
0
12 Jun 2024
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Sho Inoue
Kun Zhou
Shuai Wang
Haizhou Li
139
11
0
15 May 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
501
1
0
16 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
232
218
0
23 Dec 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
236
167
0
06 Sep 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
234
26
0
31 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Neural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
237
413
0
23 Jun 2023
Disentangled Variational Autoencoder for Emotion Recognition in Conversations
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Kailai Yang
Tianlin Zhang
Sophia Ananiadou
DRL
235
16
0
23 May 2023
Cluster-Level Contrastive Learning for Emotion Recognition in Conversations
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Kailai Yang
Tianlin Zhang
Hassan Alhuzali
Sophia Ananiadou
164
59
0
07 Feb 2023
Flow Matching for Generative Modeling
International Conference on Learning Representations (ICLR), 2022
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
807
2,611
0
06 Oct 2022
Speech Synthesis with Mixed Emotions
IEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
280
60
0
11 Aug 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Interspeech (Interspeech), 2022
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
203
17
0
04 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Guangyan Zhang
Ying Qin
Weinan Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
287
33
0
29 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
International Conference on Learning Representations (ICLR), 2022
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
259
367
0
09 Jun 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
Interspeech (Interspeech), 2022
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
169
17
0
29 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the valence gap
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
291
401
0
14 Mar 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
163
93
0
17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion
IEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
328
73
0
10 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
International Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
561
535
0
04 Dec 2021
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
146
20
0
07 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
735
2,554
0
26 Oct 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
153
62
0
14 Sep 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Interspeech (Interspeech), 2021
Shifeng Pan
Lei He
180
24
0
27 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
472
3,879
0
14 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD
Speech Communication (Speech Commun.), 2021
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
308
237
0
31 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
International Conference on Machine Learning (ICML), 2021
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
286
648
0
13 May 2021
Orthogonal Projection Loss
IEEE International Conference on Computer Vision (ICCV), 2021
Kanchana Ranasinghe
Muzammal Naseer
Munawar Hayat
Salman Khan
Fahad Shahbaz Khan
VLM
130
84
0
25 Mar 2021
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Spoken Language Technology Workshop (SLT), 2020
Yinjiao Lei
Shan Yang
Lei Xie
144
61
0
17 Nov 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.1K
7,195
0
20 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
232
565
0
22 May 2020
Emotional Voice Conversion using Multitask Learning with Text-to-speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Tae-Ho Kim
Sungjae Cho
Shinkook Choi
Sejik Park
Soo-Young Lee
205
43
0
11 Nov 2019
Emotional speech synthesis with rich and granularized control
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
254
96
0
05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
150
160
0
26 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
International Conference on Learning Representations (ICLR), 2019
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
130
48
0
03 Oct 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
295
1,173
0
05 Apr 2019
MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels
Zhongzhe Xiao
Ying-Cong Chen
W. Dou
Zhi Tao
Liming Chen
91
9
0
30 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
559
900
0
12 Jun 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
253
880
0
23 Mar 2018
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
563
6,195
0
02 Nov 2017
1
2
Next