Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet

13 July 2025

Wenmiao Gao

Han Yin

Mamba

ArXiv (abs)PDF HTML

Main:4 Pages

2 Figures

Bibliography:2 Pages

Abstract

Pre-training methods have greatly improved the performance of sound event localization and detection (SELD). However, existing Transformer-based models still face high computational cost. To solve this problem, we present a stereo SELD system using a pre-trained PSELDnet and a bidirectional Mamba sequence model. Specifically, we replace the Conformer module with a BiMamba module. We also use asymmetric convolutions to better capture the time and frequency relationships in the audio signal. Test results on the DCASE2025 Task 3 development dataset show that our method performs better than both the baseline and the original PSELDnet with a Conformer decoder. In addition, the proposed model costs fewer computing resources than the baselines. These results show that the BiMamba architecture is effective for solving key challenges in SELD tasks. The source code is publicly accessible atthis https URLalexandergwm/DCASE2025 TASK3 Stereo PSELD Mamba.

View on arXiv

Comments on this paper