ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSL
    VLM
    ViT
ArXivPDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 114 papers shown
Title
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
ELM
19
0
0
10 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
42
0
0
06 May 2025
Contextures: Representations from Contexts
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
32
0
0
02 May 2025
Heterogeneous bimodal attention fusion for speech emotion recognition
Heterogeneous bimodal attention fusion for speech emotion recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua Reiss
42
0
0
09 Mar 2025
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
C. Ciușdel
Alex Serban
Tiziano Passerini
CoGe
64
1
0
03 Feb 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Salar Abbaspourazad
Anshuman Mishra
Joseph D. Futoma
Andrew C. Miller
Ian Shapiro
83
0
0
15 Dec 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
58
3
0
14 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
23
0
0
09 Oct 2024
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Yi-Jen Shih
Zoi Gkalitsiou
A. Dimakis
David Harwath
24
1
0
16 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
M. Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
39
3
0
30 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
47
32
0
29 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
24
4
0
21 Jul 2024
Emotion-Aware Speech Self-Supervised Representation Learning with
  Intensity Knowledge
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
Rui Liu
Zening Ma
SSL
29
1
0
10 Jun 2024
Investigating the Áutoencoder Behavior' in Speech Self-Supervised
  Models: a focus on HuBERT's Pretraining
Investigating the Áutoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining
Valentin Vielzeuf
SSL
30
0
0
14 May 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss
  Weighting
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting
Shreyan Ganguly
Roshan Nayak
Rakshith Rao
Ujan Deb
AP Prathosh
21
1
0
11 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIP
VLM
50
19
0
30 Apr 2024
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
Ayumu Saito
Prachi Kudeshia
Jiju Poovvancheri
3DPC
22
7
0
25 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
46
1
0
16 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
OmniSat: Self-Supervised Modality Fusion for Earth Observation
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
23
23
0
12 Apr 2024
NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep
  Stage Classification Using Single-Channel EEG
NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG
Cheol-Hui Lee
Hakseung Kim
Hyun-jee Han
Min-Kyung Jung
Byung C. Yoon
Dong-Joo Kim
27
5
0
10 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
27
5
0
28 Mar 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
57
2
0
21 Mar 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Licai Sun
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
11
12
0
07 Jan 2024
Morphing Tokens Draw Strong Masked Image Models
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
34
3
0
30 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
19
1
0
18 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
25
62
0
11 Dec 2023
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL
  Architectures
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Vimal Thilak
Chen Huang
Omid Saremi
Laurent Dinh
Hanlin Goh
Preetum Nakkiran
Josh Susskind
Etai Littwin
18
7
0
07 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
32
4
0
05 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
18
17
0
27 Nov 2023
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
Jia Li
Yanyan Shen
Lei Chen
Charles Wang Wai Ng
9
3
0
27 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
12
3
0
15 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
10
3
0
12 Oct 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TS
GNN
44
3
0
27 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
22
4
0
05 Sep 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
15
2
0
28 Aug 2023
Elucidate Gender Fairness in Singing Voice Transcription
Elucidate Gender Fairness in Singing Voice Transcription
Xiangming Gu
Weizhen Zeng
Ye Wang
10
3
0
05 Aug 2023
Learn from Incomplete Tactile Data: Tactile Representation Learning with
  Masked Autoencoders
Learn from Incomplete Tactile Data: Tactile Representation Learning with Masked Autoencoders
G. Cao
Jiaqi Jiang
Danushka Bollegala
Shan Luo
13
10
0
14 Jul 2023
Self-supervised adversarial masking for 3D point cloud representation
  learning
Self-supervised adversarial masking for 3D point cloud representation learning
Michal Szachniewicz
Wojciech Kozlowski
Michal Stypulkowski
Maciej Ziȩba
3DPC
4
2
0
11 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
24
5
0
11 Jul 2023
Decentralized Quantum Federated Learning for Metaverse: Analysis, Design
  and Implementation
Decentralized Quantum Federated Learning for Metaverse: Analysis, Design and Implementation
Devya Gurung
Shiva Raj Pokhrel
Gang Li
16
5
0
20 Jun 2023
Quantifying the Variability Collapse of Neural Networks
Quantifying the Variability Collapse of Neural Networks
Jing-Xue Xu
Haoxiong Liu
26
4
0
06 Jun 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and
  Diarization with a Sidecar Separator
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Lingwei Meng
Jiawen Kang
Mingyu Cui
Haibin Wu
Xixin Wu
Helen M. Meng
18
10
0
25 May 2023
Detecting Check-Worthy Claims in Political Debates, Speeches, and
  Interviews Using Audio Data
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
Petar Ivanov
Ivan Koychev
Momchil Hardalov
Preslav Nakov
19
4
0
24 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
10
5
0
19 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
26
2
0
19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
27
5
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
13
113
0
18 May 2023
123
Next