32
0

Synthetic data enables context-aware bioacoustic sound event detection

Abstract

We propose a methodology for training foundation models that enhances their in-context learning capabilities within the domain of bioacoustic signal processing. We use synthetically generated training data, introducing a domain-randomization-based pipeline that constructs diverse acoustic scenes with temporally strong labels. We generate over 8.8 thousand hours of strongly-labeled audio and train a query-by-example, transformer-based model to perform few-shot bioacoustic sound event detection. Our second contribution is a public benchmark of 13 diverse few-shot bioacoustics tasks. Our model outperforms previously published methods by 49%, and we demonstrate that this is due to both model design and data scale. We make our trained model available via an API, to provide ecologists and ethologists with a training-free tool for bioacoustic sound event detection.

View on arXiv
@article{hoffman2025_2503.00296,
  title={ Synthetic data enables context-aware bioacoustic sound event detection },
  author={ Benjamin Hoffman and David Robinson and Marius Miron and Vittorio Baglione and Daniela Canestrari and Damian Elias and Eva Trapote and Olivier Pietquin },
  journal={arXiv preprint arXiv:2503.00296},
  year={ 2025 }
}
Comments on this paper