BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

15 April 2022

Papers citing "BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations"

41 / 41 papers shown

Title
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis Daisuke Niizumi Daiki Takeuchi Masahiro Yasuda Binh Thien Nguyen Yasunori Ohishi N. Harada 27 0 0 25 Apr 2025
Parameter-Efficient Continual Fine-Tuning: A Survey Eric Nuertey Coleman Luigi Quarantiello Ziyue Liu Qinwen Yang Samrat Mukherjee J. Hurtado Vincenzo Lomonaco CLL 27 0 0 18 Apr 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning Aurian Quélennec Pierre Chouteau Geoffroy Peeters S. Essid SSL 52 0 0 17 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling Jakob Poncelet Hugo Van hamme 67 0 0 05 Feb 2025
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection Pengfei Cai Yan Song Nan Jiang Qing Gu Ian Mcloughlin 30 2 0 26 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model Carlos Hernandez-Olivan Marc Delcroix Tsubasa Ochiai Daisuke Niizumi Naohiro Tawara Tomohiro Nakatani Shoko Araki 29 2 0 19 Sep 2024
Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation Alain Riou Stefan Lattner Gaëtan Hadjeres Michael Anslow Geoffroy Peeters 26 2 0 05 Aug 2024
Self-Supervised Embeddings for Detecting Individual Symptoms of Depression Sri Harsha Dumpala Katerina Dikaios Abraham Nunes Frank Rudzicz Rudolf Uher Sageev Oore SSL 36 1 0 25 Jun 2024
Scaling up masked audio encoder learning for general audio classification Heinrich Dinkel Zhiyong Yan Yongqing Wang Junbo Zhang Yujun Wang Bin Wang 22 2 0 11 Jun 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada Masahiro Yasuda Shunsuke Tsubaki Keisuke Imoto VLM 31 5 0 04 Jun 2024
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning Alain Riou Stefan Lattner Gaëtan Hadjeres Geoffroy Peeters 21 2 0 14 May 2024
Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion Syed Hammad Ahmed M. Khan G. Sukthankar 21 0 0 09 May 2024
Benchmarking Representations for Speech, Music, and Acoustic Events Moreno La Quatra Alkis Koudounas Lorenzo Vaiani Elena Baralis Luca Cagliero Paolo Garza Sabato Marco Siniscalchi 24 10 0 02 May 2024
Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino MedIm 19 2 0 26 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino 29 10 0 09 Apr 2024
On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations Matthew C. McCallum Matthew E. P. Davies Florian Henkel Jaehun Kim Samuel E. Sandberg 33 6 0 17 Jan 2024
Singer Identity Representation Learning using Self-Supervised Techniques Bernardo Torres Stefan Lattner Gaël Richard SSL 27 8 0 10 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild Zhi-Song Liu Robin Courant Vicky Kalogeiton 25 6 0 08 Jan 2024
Self-Supervised Learning for Few-Shot Bird Sound Classification Ilyass Moummad Romain Serizel Nicolas Farrugia SSL 11 9 0 25 Dec 2023
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings Aurian Quélennec Michel Olvera Geoffroy Peeters S. Essid 17 2 0 21 Dec 2023
Self-Supervised Learning for Anomalous Sound Detection Kevin Wilkinghoff 29 11 0 15 Dec 2023
Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer Bing Yang Xiaofei Li SSL 17 3 0 01 Dec 2023
Semi-supervised Sound Event Detection with Local and Global Consistency Regularization Yiming Li Xiangdong Wang Hong Liu Rui Tao Long Yan Kazushige Ouchi 13 3 0 15 Sep 2023
PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective Alain Riou Stefan Lattner Gaëtan Hadjeres Geoffroy Peeters 11 12 0 05 Sep 2023
Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning Ilyass Moummad Romain Serizel Nicolas Farrugia 15 2 0 02 Sep 2023
How to Scale Your EMA Dan Busbridge Jason Ramapuram Pierre Ablin Tatiana Likhomanenko Eeshan Gunesh Dhekane Xavier Suau Russ Webb 25 17 0 25 Jul 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks Xian Li Nian Shao Xiaofei Li ViT CLIP 8 24 0 07 Jun 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino 37 3 0 23 May 2023
Environmental sound synthesis from vocal imitations and sound event labels Yuki Okamoto Keisuke Imoto Shinnosuke Takamichi Ryotaro Nagase Takahiro Fukumori Y. Yamashita 13 0 0 29 Apr 2023
Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play Irmak Güzey Ben Evans Soumith Chintala Lerrel Pinto 54 64 0 21 Mar 2023
Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation Bac Nguyen Stefan Uhlich Fabien Cardinaux SSL 26 3 0 07 Mar 2023
Training one model to detect heart and lung sound events from single point auscultations Leander Melms Robert R. Ilesan Ulrich Köhler O. Hildebrandt R. Conradt ... Jürgen R. Schaefer Tobias Müller J. Obergassel Nadine Schlicker M. Hirsch 15 2 0 15 Jan 2023
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning Pritam Sarkar Ali Etemad 19 20 0 25 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis Bryce Irvin Marko Stamenovic M. Kegler Li-Chia Yang 27 18 0 04 Nov 2022
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino SSL 18 29 0 26 Oct 2022
Audio Barlow Twins: Self-Supervised Audio Representation Learning Jonah Anton H. Coppock Pancham Shukla Bjorn W. Schuller BDL SSL 19 8 0 28 Sep 2022
Representation Learning for the Automatic Indexing of Sound Effects Libraries Alison B. Ma Alexander Lerch 21 0 0 18 Aug 2022
Multimodal Self-Supervised Learning of General Audio Representations Luyu Wang Pauline Luc Adrià Recasens Jean-Baptiste Alayrac Aaron van den Oord SSL 70 41 0 26 Apr 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Yuan Gong Yu-An Chung James R. Glass VLM 99 144 0 02 Feb 2021
Multi-task self-supervised learning for Robust Speech Recognition Mirco Ravanelli Jianyuan Zhong Santiago Pascual P. Swietojanski João Monteiro J. Trmal Yoshua Bengio SSL 171 288 0 25 Jan 2020
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 261 10,196 0 16 Nov 2016