Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.06695
Cited By
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
11 March 2021
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation"
50 / 103 papers shown
Title
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
29
1
0
28 Feb 2025
MCLRL: A Multi-Domain Contrastive Learning with Reinforcement Learning Framework for Few-Shot Modulation Recognition
Dongwei Xu
Yutao Zhu
Yao Lu
Youpeng Feng
Yun Lin
Qi Xuan
69
0
0
26 Feb 2025
Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder
Zhenghao Xi
Yuchao Shao
Yang Zheng
Xiang Liu
Yaqi Liu
Yitong Cai
55
0
0
24 Feb 2025
Phoneme-Level Contrastive Learning for User-Defined Keyword Spotting with Flexible Enrollment
Li Kewei
Zhou Hengshun
Shen Kai
Dai Yusheng
Du Jun
31
1
0
31 Dec 2024
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Hugo Thimonier
José Lucas De Melo Costa
Fabrice Popineau
Arpad Rimmel
Bich-Liên Doan
53
1
0
07 Oct 2024
BiSSL: A Bilevel Optimization Framework for Enhancing the Alignment Between Self-Supervised Pre-Training and Downstream Fine-Tuning
Gustav Wagner Zakarias
Lars Kai Hansen
Z. Tan
22
0
0
03 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
39
2
0
02 Oct 2024
Self-supervised Learning for Acoustic Few-Shot Classification
Jingyong Liang
Bernd Meyer
Issac Ning Lee
Thanh-Toan Do
SSL
47
0
0
15 Sep 2024
Domain-Invariant Representation Learning of Bird Sounds
Ilyass Moummad
Romain Serizel
Emmanouil Benetos
Nicolas Farrugia
SSL
27
2
0
13 Sep 2024
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
22
3
0
11 Sep 2024
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
30
1
0
29 Aug 2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
Ioannis Maniadis Metaxas
Georgios Tzimiropoulos
Ioannis Patras
SSL
27
0
0
15 Jul 2024
STONE: Self-supervised Tonality Estimator
Yuexuan Kong
Vincent Lostanlen
Gabriel Meseguer-Brocal
Stella Wong
Mathieu Lagrange
Romain Hennequin
26
1
0
10 Jul 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRL
MedIm
35
3
0
04 Jul 2024
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
Yuwei Zhang
Tong Xia
Jing Han
Yu Wu
Georgios Rizos
Yang Liu
Mohammed Mosuily
Jagmohan Chauhan
Cecilia Mascolo
AI4CE
31
6
0
23 Jun 2024
Predicting Heart Activity from Speech using Data-driven and Knowledge-based features
Gasser Elbanna
Z. Mostaani
Mathew Magimai.-Doss
SSL
30
0
0
10 Jun 2024
Contrastive Learning from Synthetic Audio Doppelgängers
Manuel Cherep
Nikhil Singh
24
1
0
09 Jun 2024
UrBAN: Urban Beehive Acoustics and PheNotyping Dataset
Mahsa Abdollahi
Yi Zhu
Heitor R. Guimarães
Nico Coallier
Ségolène Maucourt
Pierre Giovenazzo
Tiago H. Falk
18
0
0
05 Jun 2024
Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation
Jared Mejia
Victoria Dean
Tess Hellebrekers
Abhinav Gupta
35
12
0
14 May 2024
Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
MedIm
17
2
0
26 Apr 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
28
1
0
21 Apr 2024
Learning Tracking Representations from Single Point Annotations
Qiangqiang Wu
Antoni B. Chan
25
1
0
15 Apr 2024
An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
Gabriel Meseguer-Brocal
Dorian Desblancs
Romain Hennequin
SSL
25
3
0
14 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
29
3
0
12 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
29
10
0
09 Apr 2024
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
Afrina Tabassum
Dung N. Tran
Trung D. Q. Dang
Ismini Lourentzou
K. Koishida
29
0
0
14 Mar 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
25
6
0
08 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
33
17
0
07 Jan 2024
Self-Supervised Learning for Anomalous Sound Detection
Kevin Wilkinghoff
29
11
0
15 Dec 2023
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Zhixi Cai
Shreya Ghosh
Aman Pankaj Adatia
Munawar Hayat
Abhinav Dhall
Kalin Stefanov
11
26
0
26 Nov 2023
MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation
Kelin Yu
Yunhai Han
Matthew Zhu
Vaibhav Saxena
Danfei Xu
Ye Zhao
11
11
0
25 Oct 2023
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
19
16
0
23 Oct 2023
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
19
3
0
28 Sep 2023
Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
SSL
8
1
0
27 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
44
13
0
19 Sep 2023
Do learned speech symbols follow Zipf's law?
Shinnosuke Takamichi
Hiroki Maeda
Joonyong Park
Daisuke Saito
Hiroshi Saruwatari
13
1
0
18 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
L. Pepino
Pablo Riera
Luciana Ferrer
14
4
0
14 Sep 2023
Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals
Louis Blankemeier
Sebastien Baur
Wei-Hung Weng
Jake Garrison
Yossi Matias
Shruthi Prabhakara
Diego Ardila
Zaid Nabulsi
24
0
0
11 Sep 2023
UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization
Rui Zhang
Hongxia Wang
Ming-han Du
Hanqing Liu
Yangqiaoyu Zhou
Q. Zeng
13
19
0
28 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
6
6
0
23 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
17
218
0
10 Aug 2023
Speaker Embeddings as Individuality Proxy for Voice Stress Detection
Zihan Wu
Neil Scheidwasser
Karl El Hajal
Milos Cernak
24
3
0
09 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
8
24
0
07 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Z. Tan
15
7
0
01 Jun 2023
LEAN: Light and Efficient Audio Classification Network
Shwetank Choudhary
C. Karthik
Punuru Sri Lakshmi
Sumit Kumar
AI4TS
17
5
0
22 May 2023
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
30
155
0
19 May 2023
Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity
Pablo Alonso-Jiménez
Xavier Favory
Hadrien Foroughmand
Grigoris Bourdalas
Xavier Serra
T. Lidy
Dmitry Bogdanov
21
6
0
24 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
17
6
0
22 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis
Mehdi Azabou
Michael J. Mendelson
Nauman Ahad
Maks Sorokin
S. Thakoor
Carolina Urzay
Eva L. Dyer
12
4
0
15 Mar 2023
1
2
3
Next