ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown
Bigger is not Always Better: The Effect of Context Size on Speech
  Pre-Training
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
226
1
0
03 Dec 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
236
0
0
30 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
388
34
0
27 Nov 2023
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
Jia Li
Yanyan Shen
Lei Chen
Charles Wang Wai Ng
206
6
0
27 Nov 2023
Explainable Time Series Anomaly Detection using Masked Latent Generative
  Modeling
Explainable Time Series Anomaly Detection using Masked Latent Generative ModelingPattern Recognition (Pattern Recogn.), 2023
Daesoo Lee
Sara Malacarne
Erlend Aune
AI4TS
338
25
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language
  Explanation
From Wrong To Right: A Recursive Approach Towards Vision-Language ExplanationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
252
4
0
21 Nov 2023
Self-Distilled Representation Learning for Time Series
Self-Distilled Representation Learning for Time Series
Felix Pieper
Konstantin Ditschuneit
Martin Genzel
Alexandra Lindt
Johannes Otterbach
AI4TS
157
3
0
19 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic PiecesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Heng-Jui Chang
James R. Glass
246
8
0
15 Nov 2023
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote
  Sensing Image Classification
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification
Junyan Lin
Feng Gao
Xiaochen Shi
Junyu Dong
Q. Du
187
80
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Siddharth Srivastava
Gaurav Sharma
SSL
288
85
0
07 Nov 2023
FATE: Feature-Agnostic Transformer-based Encoder for learning
  generalized embedding spaces in flow cytometry data
FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry dataIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Lisa Weijler
Florian Kowarsch
Michael Reiter
Pedro Hermosilla
Margarita Maurer-Granofszky
Michael N. Dworzak
MedIm
169
5
0
06 Nov 2023
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
R. N. Nandi
Mehadi Hasan Menon
Tareq Al Muntasir
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Tariqul Islam
Shammur A. Chowdhury
Firoj Alam
290
3
0
06 Nov 2023
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Towards Calibrated Robust Fine-Tuning of Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Changdae Oh
Hyesu Lim
Mijoo Kim
Dongyoon Han
Junhyeok Park
Euiseog Jeong
Alexander G. Hauptmann
Zhi-Qi Cheng
Kyungwoo Song
VLM
743
30
0
03 Nov 2023
Investigating Relative Performance of Transfer and Meta Learning
Investigating Relative Performance of Transfer and Meta Learning
Benji Alwis
90
0
0
31 Oct 2023
Mean BERTs make erratic language teachers: the effectiveness of latent
  bootstrapping in low-resource settings
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
David Samuel
180
4
0
30 Oct 2023
Pre-training with Random Orthogonal Projection Image Modeling
Pre-training with Random Orthogonal Projection Image ModelingInternational Conference on Learning Representations (ICLR), 2023
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
341
14
0
28 Oct 2023
Large-scale Foundation Models and Generative AI for BigData Neuroscience
Large-scale Foundation Models and Generative AI for BigData NeuroscienceNeurosciences research (Neurosci Res), 2023
Ran Wang
Zhe Sage Chen
MedImAI4CELRM
181
18
0
27 Oct 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked
  Auto-Encoder
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-EncoderNeural Information Processing Systems (NeurIPS), 2023
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
212
6
0
25 Oct 2023
Fine tuning Pre trained Models for Robustness Under Noisy Labels
Fine tuning Pre trained Models for Robustness Under Noisy LabelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Sumyeong Ahn
Sihyeon Kim
Jongwoo Ko
SeYoung Yun
AAMLNoLa
372
16
0
24 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal
  Contextual Representation
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual RepresentationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
395
11
0
22 Oct 2023
Learning with Unmasked Tokens Drives Stronger Vision Learners
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim
Sanghyuk Chun
Byeongho Heo
Dongyoon Han
SSL
294
3
0
20 Oct 2023
A Car Model Identification System for Streamlining the Automobile Sales
  Process
A Car Model Identification System for Streamlining the Automobile Sales Process
Said Togru
Marco Moldovan
218
0
0
19 Oct 2023
Detecting Speech Abnormalities with a Perceiver-based Sequence
  Classifier that Leverages a Universal Speech Model
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech ModelAutomatic Speech Recognition & Understanding (ASRU), 2023
H. Soltau
Izhak Shafran
Alex Ottenwess
Joseph R. Duffy
Rene L. Utianski
L. Barnard
John L. Stricker
D. Wiepert
David T. Jones
Hugo Botha
178
3
0
16 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and TextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
397
7
0
12 Oct 2023
Incorporating Domain Knowledge Graph into Multimodal Movie Genre
  Classification with Self-Supervised Attention and Contrastive Learning
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive LearningACM Multimedia (ACM MM), 2023
Jiaqi Li
Guilin Qi
Chuanyi Zhang
Yongrui Chen
Yiming Tan
Chenlong Xia
Ye Tian
210
6
0
12 Oct 2023
Learning Separable Hidden Unit Contributions for Speaker-Adaptive
  Lip-Reading
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Songtao Luo
Shuang Yang
Shiguang Shan
Xilin Chen
295
2
0
08 Oct 2023
Enhancing Representations through Heterogeneous Self-Supervised Learning
Enhancing Representations through Heterogeneous Self-Supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhongyu Li
Bo-Wen Yin
Yongxiang Liu
Tianpeng Liu
Ming-Ming Cheng
SSL
366
3
0
08 Oct 2023
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable
  Evasion Attacks
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks
Ofir Bar Tal
Adi Haviv
Amit H. Bermano
AAML
176
0
0
05 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit PredictionInternational Conference on Learning Representations (ICLR), 2023
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
273
36
0
04 Oct 2023
Operator Learning Meets Numerical Analysis: Improving Neural Networks
  through Iterative Methods
Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods
E. Zappala
Daniel Levine
Shiyang Zhang
S. Rizvi
Sacha Lévy
David van Dijk
168
1
0
02 Oct 2023
Active Learning Based Fine-Tuning Framework for Speech Emotion
  Recognition
Active Learning Based Fine-Tuning Framework for Speech Emotion RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Dongyuan Li
Yusong Wang
Kotaro Funakoshi
Manabu Okumura
347
5
0
30 Sep 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
212
5
0
29 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Shucheng Zhou
Marco Cristani
AI4TSGNN
520
11
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual
  Self-supervised Learning
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised LearningAutomatic Speech Recognition & Understanding (ASRU), 2023
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
265
13
0
26 Sep 2023
M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for
  2D image and video understanding
M3^{3}33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understandingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
259
2
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial
  Datasets
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
212
4
0
26 Sep 2023
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech
  Representation Learning
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation LearningAutomatic Speech Recognition & Understanding (ASRU), 2023
Guan-lin Yang
Ziyang Ma
Zhisheng Zheng
Ya-Zhen Song
Zhikang Niu
Xie Chen
200
9
0
25 Sep 2023
M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and
  Siamese Decoders
M3^33CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders
Qibo Qiu
Honghui Yang
Wenxiao Wang
Shun Zhang
Haiming Gao
Haochao Ying
Wei Hua
Xiaofei He
3DPC
204
0
0
23 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion
  Recognition
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
246
29
0
19 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of
  Speech in ASR Tasks
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
221
0
0
14 Sep 2023
CoLLD: Contrastive Layer-to-layer Distillation for Compressing
  Multilingual Pre-trained Speech Encoders
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Heng-Jui Chang
Ning Dong
Ruslan Mavlyutov
Sravya Popuri
Yu-An Chung
335
8
0
14 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio
  Representation
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio RepresentationInternational Conference on Multimodal Interaction (ICMI), 2023
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
227
30
0
11 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Multimodal Fish Feeding Intensity Assessment in AquacultureIEEE Transactions on Automation Science and Engineering (IEEE TASE), 2023
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
289
22
0
10 Sep 2023
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped
  Positions
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped PositionsNeural Information Processing Systems (NeurIPS), 2023
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tong Wang
Zhaoxiang Zhang
262
25
0
07 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion RecognitionInterspeech (Interspeech), 2023
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
239
6
0
05 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech TokenizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhichao Huang
Chutong Meng
Tom Ko
217
41
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech RecognitionInterspeech (Interspeech), 2023
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
185
3
0
28 Aug 2023
Diversified Ensemble of Independent Sub-Networks for Robust
  Self-Supervised Representation Learning
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning
Amirhossein Vahidi
Lisa Wimmer
H. Gündüz
B. Bischl
Eyke Hüllermeier
Mina Rezaei
OODUQCV
293
4
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
208
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
240
19
0
28 Aug 2023
Previous
123...567...111213
Next
Page 6 of 13
Pageof 13