47

Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet

Wenmiao Gao
Han Yin
Main:4 Pages
2 Figures
Bibliography:2 Pages
Abstract

Pre-training methods have greatly improved the performance of sound event localization and detection (SELD). However, existing Transformer-based models still face high computational cost. To solve this problem, we present a stereo SELD system using a pre-trained PSELDnet and a bidirectional Mamba sequence model. Specifically, we replace the Conformer module with a BiMamba module. We also use asymmetric convolutions to better capture the time and frequency relationships in the audio signal. Test results on the DCASE2025 Task 3 development dataset show that our method performs better than both the baseline and the original PSELDnet with a Conformer decoder. In addition, the proposed model costs fewer computing resources than the baselines. These results show that the BiMamba architecture is effective for solving key challenges in SELD tasks. The source code is publicly accessible atthis https URLalexandergwm/DCASE2025 TASK3 Stereo PSELD Mamba.

View on arXiv
Comments on this paper