Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation

We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.
View on arXiv