v1v2v3v4v5 (latest)

Discrete Optimal Transport and Voice Conversion

7 May 2025

Anton Selitskiy

Maitreya Kocharekar

ArXiv (abs)PDF HTML Github (95276★)

Main:3 Pages

7 Figures

Bibliography:1 Pages

1 Tables

Abstract

In this work, we address the task of voice conversion (VC) using a vector-based interface. To align audio embeddings across speakers, we employ discrete optimal transport (OT) and approximate the transport map using the barycentric projection. Our evaluation demonstrates that this approach yields high-quality and effective voice conversion. We also perform an ablation study on the number of embeddings used, extending previous work on simple averaging of kNN and OT results. Additionally, we show that applying discrete OT as a post-processing step in audio generation can cause synthetic speech to be misclassified as real, revealing a novel and strong adversarial attack.

View on arXiv

Comments on this paper