ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown
Title
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
Jingyu Li
Lingchao Mao
Hairong Wang
Zhendong Wang
Xi Mao
Xuelei Sherry Ni
102
0
0
09 Jun 2025
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
Howon Ryu
Y. Chen
Yacun Wang
Andrea Z. LaCroix
Chongzhi Di
L. Natarajan
Yu Wang
Jingjing Zou
286
0
0
02 Jun 2025
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Aleksandr Kutsakov
Alexandr Maximenko
Georgii Gospodinov
Pavel Bogomolov
Fyodor Minkin
193
0
0
01 Jun 2025
$\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
AVROBUSTBENCH\texttt{AVROBUSTBENCH}AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
Sarthak Kumar Maharana
Saksham Singh Kushwaha
Baoming Zhang
Adrian Rodriguez
Songtao Wei
Yapeng Tian
Yunhui Guo
TTAVLM
261
0
0
31 May 2025
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
Tianyi Xu
Hongjie Chen
Wang Qing
Lv Hang
Jian Kang
Li Jie
Zhennan Lin
Yongxiang Li
Xie Lei
227
3
0
27 May 2025
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Zhaoqing Li
Haoning Xu
Xurong Xie
Zengrui Jin
Tianzi Wang
Xunying Liu
171
0
0
27 May 2025
Automated data curation for self-supervised learning in underwater acoustic analysis
Automated data curation for self-supervised learning in underwater acoustic analysis
Hilde I. Hummel
Sandjai Bhulai
Burooj Ghani
R. V. D. Mei
170
0
0
26 May 2025
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
358
3
0
22 May 2025
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
Wen-Chin Huang
Erica Cooper
Tomoki Toda
314
5
0
21 May 2025
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition
Shuo Zhang
Jinsong Zhang
Zhejun Zhang
Lei Li
MoE
149
0
0
20 May 2025
Self-supervised perception for tactile skin covered dexterous hands
Self-supervised perception for tactile skin covered dexterous hands
Akash Sharma
Carolina Higuera
Chaithanya Krishna Bodduluri
Ziqiang Liu
Taosha Fan
...
Byron Boots
Michael Kaess
Tingfan Wu
Francois Robert Hogan
Mustafa Mukadam
SSL
230
4
0
16 May 2025
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
ELM
281
2
0
10 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
490
2
0
06 May 2025
Contextures: Representations from Contexts
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
947
1
0
02 May 2025
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
Max Hartman
Lav Varshney
228
0
0
22 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
CheXWorld: Exploring Image World Modeling for Radiograph Representation LearningComputer Vision and Pattern Recognition (CVPR), 2025
Yang Yue
Yulin Wang
Chenxin Tao
Pan Liu
Shiji Song
Gao Huang
MedIm
291
3
0
18 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe GuidanceComputer Vision and Pattern Recognition (CVPR), 2025
Yang Yue
Yulin Wang
Haojun Jiang
Pan Liu
Qing Xiao
Gao Huang
VGen
326
6
0
17 Apr 2025
Balancing long- and short-term dynamics for the modeling of saliency in videos
Balancing long- and short-term dynamics for the modeling of saliency in videos
Theodor Wulff
Fares Abawi
Philipp Allgeuer
Stefan Wermter
144
0
0
08 Apr 2025
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
Shabnam Choudhury
Yash Salunkhe
Sarthak Mehrotra
Biplab Banerjee
272
2
0
04 Apr 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haizhou Li
AI4TS
203
0
0
03 Apr 2025
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
Mingrui Ye
Lianping Yang
Hegui Zhu
Zenghao Zheng
Xin Wang
Yantao Lo
ViT
273
1
0
02 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGenVGen
519
3
0
01 Apr 2025
Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting
Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting
Alimjan Mattursun
Liejun Wang
Yinfeng Yu
Chunyang Ma
234
1
0
27 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
282
7
0
26 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
266
0
0
20 Mar 2025
Heterogeneous bimodal attention fusion for speech emotion recognition
Heterogeneous bimodal attention fusion for speech emotion recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua Reiss
304
0
0
09 Mar 2025
The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification
Birger Moëll
Fredrik Sand Aronsson
Per Östberg
Jonas Beskow
124
2
0
03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and GenerationInternational Conference on Learning Representations (ICLR), 2025
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
252
4
0
02 Mar 2025
Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Twofold Debiasing Enhances Fine-Grained Learning with Coarse LabelsAAAI Conference on Artificial Intelligence (AAAI), 2025
Xin-yang Zhao
Jian Jin
Yang-yang Li
Yazhou Yao
225
0
0
27 Feb 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
203
0
0
25 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
492
15
0
24 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Graph Perceiver IO: A General Architecture for Graph Structured DataPattern Recognition (Pattern Recogn.), 2022
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
366
3
0
24 Feb 2025
voc2vec: A Foundation Model for Non-Verbal VocalizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Alkis Koudounas
Moreno La Quatra
Marco Sabato Siniscalchi
Elena Baralis
239
12
0
22 Feb 2025
On the Robust Approximation of ASR MetricsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Abdul Waheed
Hanin Atwany
Rita Singh
Bhiksha Raj
299
2
0
18 Feb 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Masked Latent Prediction and Classification for Self-Supervised Audio Representation LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
SSL
357
6
0
17 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
614
2
0
10 Feb 2025
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D EchocardiographiesApplied Sciences (AS), 2025
C. Ciușdel
Alex Serban
Tiziano Passerini
CoGe
270
1
0
03 Feb 2025
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
Reza Akbarian Bafghi
Carden Bagwell
Avinash Ravichandran
Ashish Shrivastava
M. Raissi
240
4
0
28 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech RepresentationInternational Conference on Learning Representations (ICLR), 2025
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
464
4
0
23 Jan 2025
Selective Attention Merging for low resource tasks: A case study of Child ASR
Selective Attention Merging for low resource tasks: A case study of Child ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Natarajan Balaji Shankar
Zilai Wang
Eray Eren
Abeer Alwan
MoMe
136
5
0
14 Jan 2025
Optimizing Speech Multi-View Feature Fusion through Conditional Computation
Optimizing Speech Multi-View Feature Fusion through Conditional ComputationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Weiqiao Shan
Yuhao Zhang
Yuchen Han
Yangqiu Song
X. Zhao
Yongqian Li
Hao Fei
Hao Yang
Tong Xiao
Jingbo Zhu
149
0
0
14 Jan 2025
On Creating A Brain-To-Text Decoder
On Creating A Brain-To-Text Decoder
Zenon Lamprou
Yashar Moshfeghi
193
1
0
10 Jan 2025
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion RecognitionIEEE Internet of Things Journal (IEEE IoT J.), 2025
Ruoyu Zhao
Xiantao Jiang
Fei Yu
Azzedine Boukerche
Tao Wang
Shanghang Zhang
256
0
0
06 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
Junmyeong Lee
Eui Jun Hwang
Sukmin Cho
Jong C. Park
180
0
0
06 Jan 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
R. Mamidi
Zijiao Chen
Subba Reddy Oota
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DVAI4CE
375
23
0
31 Dec 2024
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked
  Autoencoder Learning
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder LearningAAAI Conference on Artificial Intelligence (AAAI), 2024
Shentong Mo
189
1
0
23 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
296
0
0
18 Dec 2024
Open Universal Arabic ASR Leaderboard
Open Universal Arabic ASR Leaderboard
Yingzhi Wang
Anas Alhmoud
Muhammad Alqurishi
ELM
164
6
0
18 Dec 2024
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
AnySat: One Earth Observation Model for Many Resolutions, Scales, and ModalitiesComputer Vision and Pattern Recognition (CVPR), 2024
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
286
5
0
18 Dec 2024
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Salar Abbaspourazad
Anshuman Mishra
Joseph D. Futoma
Andrew C. Miller
Ian Shapiro
445
4
0
15 Dec 2024
Previous
12345...111213
Next