ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown
Multi-Modal Recommendation System with Auxiliary Information
Multi-Modal Recommendation System with Auxiliary Information
Mufhumudzi Muthivhi
Terence L van Zyl
Hairong Wang
98
4
0
13 Oct 2022
The Hidden Uniform Cluster Prior in Self-Supervised Learning
The Hidden Uniform Cluster Prior in Self-Supervised LearningInternational Conference on Learning Representations (ICLR), 2022
Mahmoud Assran
Randall Balestriero
Quentin Duval
Florian Bordes
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Nicolas Ballas
SSL
208
62
0
13 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
On Compressing Sequences for Self-Supervised Speech ModelsSpoken Language Technology Workshop (SLT), 2022
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
197
15
0
13 Oct 2022
Multilingual Zero Resource Speech Recognition Base on Self-Supervise
  Pre-Trained Acoustic Models
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic ModelsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Haoyu Wang
Weiqiang Zhang
Hongbin Suo
Yulong Wan
157
1
0
13 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale
  ASR
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
205
12
0
11 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language
  Representation Learning
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation LearningAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
303
13
0
09 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code
  Representation Learning
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation LearningInterspeech (Interspeech), 2022
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
225
7
0
08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
  Based Speech-Text Pre-training
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
259
62
0
07 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised
  Pretraining
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
H. S. Bovbjerg
Zheng-Hua Tan
VLM
207
5
0
04 Oct 2022
That Sounds Right: Auditory Self-Supervision for Dynamic Robot
  Manipulation
That Sounds Right: Auditory Self-Supervision for Dynamic Robot ManipulationConference on Robot Learning (CoRL), 2022
Abitha Thankaraj
Lerrel Pinto
168
20
0
03 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLMCLIP
403
40
0
03 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual
  Pre-training Methods
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
276
10
0
30 Sep 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
  Learning in Speech and Audio
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and AudioSpoken Language Technology Workshop (SLT), 2022
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
201
9
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
312
68
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language TransformerNeural Information Processing Systems (NeurIPS), 2022
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
344
36
0
28 Sep 2022
An Efficient Multitask Learning Architecture for Affective Vocal Burst
  Analysis
An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Tobias Hallmen
Silvan Mertes
Dominik Schiller
Elisabeth André
126
5
0
28 Sep 2022
Implementing and Experimenting with Diffusion Models for Text-to-Image
  Generation
Implementing and Experimenting with Diffusion Models for Text-to-Image Generation
Robin Zbinden
135
5
0
22 Sep 2022
Deep Lake: a Lakehouse for Deep Learning
Deep Lake: a Lakehouse for Deep LearningConference on Innovative Data Systems Research (CIDR), 2022
S. Hambardzumyan
Abhina Tuli
Levon Ghukasyan
Fariz Rahman
Hrant Topchyan
...
Mark McQuade
M. Harutyunyan
Tatevik Hakobyan
I. Stranic
Davit Buniatyan
217
30
0
22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
268
1
0
17 Sep 2022
Exploring Target Representations for Masked Autoencoders
Exploring Target Representations for Masked AutoencodersInternational Conference on Learning Representations (ICLR), 2022
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
671
58
0
08 Sep 2022
Generalization in Neural Networks: A Broad Survey
Generalization in Neural Networks: A Broad SurveyNeurocomputing (Neurocomputing), 2022
Chris Rohlfs
OODAI4CE
279
19
0
04 Sep 2022
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2VecInternational Conference on Applied Informatics and Communication (ICAIC), 2022
Joon Sern Lee
Kai Keng Tay
Zong Fu Chua
93
2
0
02 Sep 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image PretrainingComputer Vision and Pattern Recognition (CVPR), 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIPVLM
281
222
0
25 Aug 2022
AI and 6G into the Metaverse: Fundamentals, Challenges and Future
  Research Trends
AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research TrendsIEEE Open Journal of the Communications Society (OJ-COMS), 2022
Muhammad Zawish
Fayaz Ali Dharejo
Sunder Ali Khowaja
Saleem Raza
Steven Davy
Kapal Dev
P. Bellavista
241
117
0
23 Aug 2022
Estimating a potential without the agony of the partition function
Estimating a potential without the agony of the partition functionSIAM Journal on Mathematics of Data Science (SIMODS), 2022
E. Haber
Moshe Eliasof
L. Tenorio
272
2
0
19 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
405
392
0
12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
302
83
0
11 Aug 2022
Understanding Masked Image Modeling via Learning Occlusion Invariant
  Feature
Understanding Masked Image Modeling via Learning Occlusion Invariant FeatureComputer Vision and Pattern Recognition (CVPR), 2022
Xiangwen Kong
Xiangyu Zhang
SSL
210
66
0
08 Aug 2022
SdAE: Self-distillated Masked Autoencoder
SdAE: Self-distillated Masked AutoencoderEuropean Conference on Computer Vision (ECCV), 2022
Yabo Chen
Yuchen Liu
Dongsheng Jiang
Xiaopeng Zhang
Wenrui Dai
H. Xiong
Qi Tian
ViT
216
86
0
31 Jul 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
234
94
0
30 Jul 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual ModelsIEEE Signal Processing Letters (SPL), 2022
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
299
30
0
29 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production ScaleKnowledge Discovery and Data Mining (KDD), 2022
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
221
8
0
19 Jul 2022
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Bootstrapped Masked Autoencoders for Vision BERT PretrainingEuropean Conference on Computer Vision (ECCV), 2022
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
218
88
0
14 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled ModalityNeural Information Processing Systems (NeurIPS), 2022
Wei-Ning Hsu
Bowen Shi
SSLVLM
316
52
0
14 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
180
34
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that ListenNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
535
387
0
13 Jul 2022
Big Learning
Big Learning
Yulai Cong
Miaoyun Zhao
AI4CE
391
0
0
08 Jul 2022
Leveraging Acoustic Contextual Representation by Audio-textual
  Cross-modal Learning for Conversational ASR
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASRInterspeech (Interspeech), 2022
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
178
10
0
03 Jul 2022
FAIR principles for AI models with a practical application for
  accelerated high energy diffraction microscopy
FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopyScientific Data (Sci Data), 2022
Nikil Ravi
Pranshu Chaturvedi
Eliu A. Huerta
Zhengchun Liu
Ryan Chard
Aristana Scourtas
K. J. Schmidt
Kyle Chard
Ben Blaiszik
Ian Foster
388
42
0
01 Jul 2022
Analysis of Self-Supervised Learning and Dimensionality Reduction
  Methods in Clustering-Based Active Learning for Speech Emotion Recognition
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion RecognitionInterspeech (Interspeech), 2022
Einari Vaaras
Manu Airaksinen
Okko Räsänen
127
7
0
21 Jun 2022
Supervision-Guided Codebooks for Masked Prediction in Speech
  Pre-training
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-trainingInterspeech (Interspeech), 2022
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
206
21
0
21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary AlgorithmInternational Journal of Computer Vision (IJCV), 2022
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Jianlong Wu
Yong Liu
Dacheng Tao
ViT
304
47
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and VideosComputer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
268
118
0
16 Jun 2022
Masked Frequency Modeling for Self-Supervised Visual Pre-Training
Masked Frequency Modeling for Self-Supervised Visual Pre-TrainingInternational Conference on Learning Representations (ICLR), 2022
Jiahao Xie
Wei Li
Xiaohang Zhan
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
246
100
0
15 Jun 2022
Masked Siamese ConvNets
Masked Siamese ConvNets
L. Jing
Jiachen Zhu
Yann LeCun
SSL
208
37
0
15 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
216
110
0
13 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual
  Representations
Extreme Masking for Learning Instance and Distributed Visual Representations
Zhirong Wu
Zihang Lai
Xiao Sun
Stephen Lin
296
24
0
09 Jun 2022
Words are all you need? Language as an approximation for human
  similarity judgments
Words are all you need? Language as an approximation for human similarity judgmentsInternational Conference on Learning Representations (ICLR), 2022
Raja Marjieh
Pol van Rijn
Ilia Sucholutsky
T. Sumers
Harin Lee
Thomas Griffiths
Nori Jacoby
262
22
0
08 Jun 2022
Towards Understanding Why Mask-Reconstruction Pretraining Helps in
  Downstream Tasks
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream TasksInternational Conference on Learning Representations (ICLR), 2022
Jia Pan
Pan Zhou
Shuicheng Yan
SSL
349
20
0
08 Jun 2022
Masked Unsupervised Self-training for Label-free Image Classification
Masked Unsupervised Self-training for Label-free Image ClassificationInternational Conference on Learning Representations (ICLR), 2022
Junnan Li
Silvio Savarese
Steven C. H. Hoi
VLMSSL
148
19
0
07 Jun 2022
Previous
123...10111213
Next