44

Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control

Masato Murata
Koichi Miyazaki
Tomoki Koriyama
Main:4 Pages
4 Figures
Bibliography:1 Pages
1 Tables
Abstract

Cross-speaker emotion intensity control aims to generate emotional speech of a target speaker with desired emotion intensities using only their neutral speech. A recently proposed method, emotion arithmetic, achieves emotion intensity control using a single-speaker emotion vector. Although this prior method has shown promising results in the same-speaker setting, it lost speaker consistency in the cross-speaker setting due to mismatches between the emotion vector of the source and target speakers. To overcome this limitation, we propose a speaker-agnostic emotion vector designed to capture shared emotional expressions across multiple speakers. This speaker-agnostic emotion vector is applicable to arbitrary speakers. Experimental results demonstrate that the proposed method succeeds in cross-speaker emotion intensity control while maintaining speaker consistency, speech quality, and controllability, even in the unseen speaker case.

View on arXiv
Comments on this paper