In this work, we address the voice conversion (VC) task using a vector-based interface. To align audio embeddings between speakers, we employ discrete optimal transport mapping. Our evaluation results demonstrate the high quality and effectiveness of this method. Additionally, we show that applying discrete optimal transport as a post-processing step in audio generation can lead to the incorrect classification of synthetic audio as real.
View on arXiv@article{selitskiy2025_2505.04382, title={ Discrete Optimal Transport and Voice Conversion }, author={ Anton Selitskiy and Maitreya Kocharekar }, journal={arXiv preprint arXiv:2505.04382}, year={ 2025 } }