v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown

RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation

Nicolas Houdré

Diego Marcos

Hugo Riffaud de Turckheim

209

04 Dec 2025

AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning

Kohei Yamamoto

Kosuke Okusa

03 Dec 2025

Enhancing next token prediction based pre-training for jet foundation models

101

03 Dec 2025

Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization

Tal Shuster

Eliya Nachmani

120

01 Dec 2025

Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach

138

25 Nov 2025

Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation

161

20 Nov 2025

Unifying Model and Layer Fusion for Speech Foundation Models

Yi-Jen Shih

David Harwath

MoMe

305

11 Nov 2025

Understanding Hardness of Vision-Language Compositionality from A Token-level Causal Lens

112

30 Oct 2025

Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning

Suman Sanyal

SSL

322

28 Oct 2025

SITS-DECO: A Generative Decoder Is All You Need For Multitask Satellite Image Time Series Modelling

Samuel J. Barrett

Docko Sow

114

21 Oct 2025

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

...

166

19 Oct 2025

Unifying Vision-Language Latents for Zero-label Image Caption Enhancement

101

14 Oct 2025

On the Alignment Between Supervised and Self-Supervised Contrastive Learning

175

09 Oct 2025

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG

Emilio Estevan

María Sierra-Torralba

Eduardo López-Larraz

Luis Montesano

177

09 Oct 2025

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation

252

08 Oct 2025

Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification

305

29 Sep 2025

Alternatives To Next Token Prediction In Text Generation - A Survey

Charlie Wyatt

Aditya Joshi

Flora D. Salim

124

29 Sep 2025

WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms

186

27 Sep 2025

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms

191

23 Sep 2025

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Vrunda N. Sukhadia

Shammur A. Chowdhury

160

18 Sep 2025

Label-Efficient Grasp Joint Prediction with Point-JEPA

Jed Guzelkabaagac

Boris Petrović

3DPC

179

13 Sep 2025

DyKen-Hyena: Dynamic Kernel Generation via Cross-Modal Attention for Multimodal Intent Recognition

Yifei Wang

Wenbin Wang

Yong Luo

102

12 Sep 2025

Deep Learning for Tuberculosis Screening in a High-burden Setting using Cough Analysis and Speech Foundation Models

157

11 Sep 2025

LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

Hai Huang

Yann LeCun

Randall Balestriero

202

11 Sep 2025

Segment Transformer: AI-Generated Music Detection via Music Structural Analysis

Yumin Kim

Seonghyeon Go

110

10 Sep 2025

Diffusion-Based Action Recognition Generalizes to Untrained Domains

273

10 Sep 2025

Mitigating Data Imbalance in Automated Speaking Assessment

117

03 Sep 2025

Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?IEEE Signal Processing Letters (IEEE SPL), 2025

Abhijit Sinha

H. Kathania

Sudarsana Reddy Kadiri

Shrikanth Narayanan

28 Aug 2025

Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL ModelsPattern Recognition Letters (Pattern Recogn. Lett.), 2025

Subham Kutum

Abhijit Sinha

H. Kathania

Sudarsana Reddy Kadiri

Mahesh Chandra Govil

28 Aug 2025

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations

126

21 Aug 2025

MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning

165

18 Aug 2025

Learn Faster and Remember More: Balancing Exploration and Exploitation for Continual Test-time Adaptation

251

18 Aug 2025

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization

Hyebin Ahn

Kangwook Jang

Hoirin Kim

111

17 Aug 2025

RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning

301

17 Aug 2025

VARAN: Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks

234

16 Aug 2025

Benchmarking Prosody Encoding in Discrete Speech Tokens

15 Aug 2025

Emphasis Sensitivity in Speech Representations

Shaun Cassini

Thomas Hain

Anton Ragni

122

15 Aug 2025

S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision

...

205

09 Aug 2025

Foundation Models for Bioacoustics -- a Comparative Review

161

02 Aug 2025

PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant ObjectiveTransactions of the International Society for Music Information Retrieval (TISMIR), 2025

264

02 Aug 2025

MINR: Implicit Neural Representations with Masked Image Modelling

Sua Lee

Joonhun Lee

Myungjoo Kang

144

30 Jul 2025

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

...

141

22 Jul 2025

Decoding Translation-Related Functional Sequences in 5ÚTRs Using Interpretable Deep Learning Models

122

22 Jul 2025

Supporting SENCOTEN Language Documentation Efforts with Automatic Speech Recognition

152

14 Jul 2025

USAD: Universal Speech and Audio Representation via Distillation

324

23 Jun 2025

Discrete JEPA: Learning Discrete Token Representations without Reconstruction

225

17 Jun 2025

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic SoundscapesInternational Conference on Learning Representations (ICLR), 2025

192

13 Jun 2025

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Yanlong Chen

Mattia Orlandi

Pierangelo Maria Rapa

Simone Benatti

Luca Benini

Yawei Li

431

12 Jun 2025

Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025

...

305

11 Jun 2025

UAD: Unsupervised Affordance Distillation for Generalization in Robotic ManipulationIEEE International Conference on Robotics and Automation (ICRA), 2025

318

10 Jun 2025