Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control

4 July 2025

Masato Murata

Koichi Miyazaki

Tomoki Koriyama

ArXiv (abs)PDF HTML Github

Main:4 Pages

4 Figures

Bibliography:1 Pages

1 Tables

Abstract

Cross-speaker emotion intensity control aims to generate emotional speech of a target speaker with desired emotion intensities using only their neutral speech. A recently proposed method, emotion arithmetic, achieves emotion intensity control using a single-speaker emotion vector. Although this prior method has shown promising results in the same-speaker setting, it lost speaker consistency in the cross-speaker setting due to mismatches between the emotion vector of the source and target speakers. To overcome this limitation, we propose a speaker-agnostic emotion vector designed to capture shared emotional expressions across multiple speakers. This speaker-agnostic emotion vector is applicable to arbitrary speakers. Experimental results demonstrate that the proposed method succeeds in cross-speaker emotion intensity control while maintaining speaker consistency, speech quality, and controllability, even in the unseen speaker case.

View on arXiv

Comments on this paper