Audio Self-supervised Learning: A Survey

2 March 2022

Shuo Liu

Adria Mallol-Ragolta

Emilia Parada-Cabeleiro

Kun Qian

Bjoern W. Schuller

Papers citing "Audio Self-supervised Learning: A Survey"

50 / 78 papers shown

Title
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis Radek Daněček Carolin Schmitt Senya Polikovsky Michael J. Black 22 0 0 18 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness Yusheng Zhao Junyu Luo Xiao Luo Weizhi Zhang Zhiping Xiao Wei Ju Philip S. Yu Ming Zhang AuLLM 32 0 0 03 Apr 2025
Heterogeneous bimodal attention fusion for speech emotion recognition Jiachen Luo Huy Phan Lin Wang Joshua Reiss 42 0 0 09 Mar 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning Aurian Quélennec Pierre Chouteau Geoffroy Peeters S. Essid SSL 47 0 0 17 Feb 2025
Evaluation of Deep Audio Representations for Hearables Fabian Gröger Pascal Baumann L. Amruthalingam Laurent Simon Ruksana Giurda Simone Lionetti 72 0 0 10 Feb 2025
A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges Aitor Sánchez-Ferrera Borja Calvo Jose A. Lozano AI4TS 35 0 0 28 Jan 2025
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder Maheswar Bora Saurabh Atreya Aritra Mukherjee Abhijit Das 68 0 0 19 Nov 2024
BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech Enhancement Network based on Self-Supervised Embedding Alimjan Mattursun Liejun Wang Yinfeng Yu 20 2 0 13 Aug 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models Andreas Triantafyllopoulos Iosif Tsangko Alexander Gebhard A. Mesaros Tuomas Virtanen Björn Schuller 39 4 0 22 Jul 2024
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models Chun Yin Tai-Shih Chi Yu Tsao Hsin-Min Wang 27 0 0 12 Jun 2024
The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition Shahin Amiriparian Lukas Christ Alexander Kathan Maurice Gerczuk Niklas Muller ... Lukas Stappen Andreas Konig Erik Cambria Björn Schuller Simone Eulitz 27 8 0 11 Jun 2024
A review on discriminative self-supervised learning methods Nikolaos Giakoumoglou Tania Stathaki SSL 34 0 0 08 May 2024
Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model Zhonglong Chen Changwei Song Yining Chen Jianqiang Li Guanghui Fu Yongsheng Tong Qing Zhao AI4MH 24 0 0 07 May 2024
Self-supervised visual learning in the low-data regime: a comparative evaluation Sotirios Konstantakos Despina Ioanna Chalkiadaki Ioannis Mademlis Yuki M. Asano E. Gavves Georgios Th. Papadopoulos 19 6 0 26 Apr 2024
Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition Carlos Peñarrubia Carlos Garrido-Munoz J. J. Valero-Mas Jorge Calvo-Zaragoza 19 1 0 17 Apr 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization Wei-Ping Huang Sung-Feng Huang Hung-yi Lee 16 0 0 23 Jan 2024
A novel dual-stream time-frequency contrastive pretext tasks framework for sleep stage classification Sergio Kazatzidis S. Mehrkanoon AI4TS 6 1 0 15 Dec 2023
Self-Supervised Learning for Anomalous Sound Detection Kevin Wilkinghoff 21 11 0 15 Dec 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces Heng-Jui Chang James R. Glass 10 3 0 15 Nov 2023
Rethinking Samples Selection for Contrastive Learning: Mining of Potential Samples Hengkui Dong Xianzhong Long Yun Li 22 2 0 01 Nov 2023
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition Isaac Slaughter Craig Greenberg Reva Schwartz Aylin Caliskan 14 4 0 29 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction Jiatong Shi H. Inaguma Xutai Ma Ilia Kulikov Anna Y. Sun 31 24 0 04 Oct 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech Titouan Parcollet H. Nguyen Solène Evain Marcely Zanon Boito Adrien Pupier ... François Portet Solange Rossato F. Ringeval D. Schwab Laurent Besacier 32 14 0 11 Sep 2023
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction Yusuf Brima U. Krumnack Simone Pika Gunther Heidemann SSL 11 0 0 07 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Erik Cambria Björn W. Schuller LM&MA AuLLM 24 36 0 24 Aug 2023
Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations Yuewei Yang Hai Helen Li Yiran Chen CML OOD 17 1 0 16 Aug 2023
Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers Lukas Rauch Raphael Schwinger Moritz Wirth Bernhard Sick Sven Tomforde Christoph Scholz 14 4 0 14 Aug 2023
Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks João A. Leite Carolina Scarton D. F. Silva 14 0 0 31 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition Weidong Chen Xiaofen Xing Peihao Chen Xiangmin Xu VLM 10 34 0 20 Jul 2023
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development Yanir Marmor Kinneret Misgav Y. Lifshitz VLM 4 3 0 17 Jul 2023
Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects Kexin Zhang Qingsong Wen Chaoli Zhang Rongyao Cai Ming Jin ... James Y. Zhang Y. Liang Guansong Pang Dongjin Song Shirui Pan AI4TS 109 97 0 16 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks Xian Li Nian Shao Xiaofei Li ViT CLIP 8 24 0 07 Jun 2023
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition Bashar Talafha Abdul Waheed Muhammad Abdul-Mageed 11 7 0 05 Jun 2023
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations Calum Heggan Timothy M. Hospedales S. Budgett Mehrdad Yaghoobi SSL 10 5 0 29 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition Haomiao Yang Jinming Zhao Gholamreza Haffari Ehsan Shareghi 9 6 0 28 May 2023
Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders Ali Siahkoohi Rudy Morel Randall Balestriero Erwan Allys G. Sainton Taichi Kawamura Maarten V. de Hoop 11 2 0 25 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino 29 3 0 23 May 2023
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation Kangwook Jang Sungnyun Kim Se-Young Yun Hoi-Rim Kim 8 5 0 19 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning Alexander H. Liu Heng-Jui Chang Michael Auli Wei-Ning Hsu James R. Glass 11 24 0 17 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps Yanfang Li Huan Wang Muxia Sun LM&MA AI4TS AI4CE 11 44 0 10 May 2023
The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation Lukas Christ Shahin Amiriparian Alice Baird Alexander Kathan Niklas Muller ... Eva-Maria Messner Andreas Konig Alan S. Cowen Erik Cambria Björn W. Schuller 6 30 0 05 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition Samir Sadok Simon Leglaive Renaud Séguier SSL 52 6 0 05 May 2023
A Cookbook of Self-Supervised Learning Randall Balestriero Mark Ibrahim Vlad Sobal Ari S. Morcos Shashank Shekhar ... Pierre Fernandez Amir Bar Hamed Pirsiavash Yann LeCun Micah Goldblum SyDa FedML SSL 31 270 0 24 Apr 2023
A vector quantized masked autoencoder for speech emotion recognition Samir Sadok Simon Leglaive Renaud Séguier 8 20 0 21 Apr 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? Chaoning Zhang Chenshuang Zhang Sheng Zheng Yu Qiao Chenghao Li ... Lik-Hang Lee Yang Yang Heng Tao Shen In So Kweon Choong Seon Hong 72 152 0 21 Mar 2023
Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation Bac Nguyen Stefan Uhlich Fabien Cardinaux SSL 20 3 0 07 Mar 2023
Phone and speaker spatial organization in self-supervised speech representations Pablo Riera M. Cerdeiro L. Pepino Luciana Ferrer SSL 8 1 0 24 Feb 2023
Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data Ali Siahkoohi Rudy Morel Maarten V. de Hoop Erwan Allys G. Sainton Taichi Kawamura 8 4 0 27 Jan 2023
DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech Kazuki Kawamura Jun Rekimoto 12 0 0 08 Dec 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing Yonggan Fu Yang Zhang Kaizhi Qian Zhifan Ye Zhongzhi Yu Cheng-I Jeff Lai Yingyan Lin 6 8 0 02 Nov 2022