ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
198
15
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw DataInternational Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
309
70
0
12 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
171
52
0
12 Dec 2022
TriNet: stabilizing self-supervised learning from complete or slow
  collapse on ASR
TriNet: stabilizing self-supervised learning from complete or slow collapse on ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Lixin Cao
Jun Wang
Ben Yang
Jane Polak Scowcroft
Dong Yu
144
4
0
12 Dec 2022
Deep Architectures for Content Moderation and Movie Content Rating
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
214
8
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
303
3
0
08 Dec 2022
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Pengcheng Li
Genshun Wan
Fenglin Ding
Hang Chen
Jianqing Gao
Jia Pan
Cong Liu
SSL
190
1
0
07 Dec 2022
Improved Self-Supervised Multilingual Speech Representation Learning
  Combined with Auxiliary Language Information
Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Fenglin Ding
Genshun Wan
Pengcheng Li
Jia Pan
Cong Liu
SSL
290
1
0
07 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By
  Multimodal Self-Distillation
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
242
15
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
330
23
0
05 Dec 2022
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music
  Audio Representation Learning
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning
Yi Zhou
Ruibin Yuan
Ge Zhang
Yi Ma
Chenghua Lin
...
Haoyu He
Emmanouil Benetos
Norbert Gyenge
Ruibo Liu
Jie Fu
SSL
184
28
0
05 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual
  Representation
Exploring Stochastic Autoregressive Image Modeling for Visual RepresentationAAAI Conference on Artificial Intelligence (AAAI), 2022
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
114
16
0
03 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
  Recognition
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
265
13
0
29 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation LearningAAAI Conference on Artificial Intelligence (AAAI), 2022
Pritam Sarkar
Ali Etemad
388
39
0
25 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
212
5
0
24 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token
  Migration
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token MigrationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
205
20
0
23 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
274
51
0
21 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical FlowIEEE International Conference on Computer Vision (ICCV), 2022
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
498
160
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual InformationComputer Vision and Pattern Recognition (CVPR), 2022
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
245
56
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLMCLIP
276
25
0
17 Nov 2022
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Assessing Neural Network Robustness via Adversarial Pivotal TuningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Peter Ebert Christensen
Vésteinn Snaebjarnarson
Andrea Dittadi
Serge Belongie
Sagie Benaim
AAML
228
1
0
17 Nov 2022
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Marc Fischer
Alexander Bartler
Bin Yang
SSeg
179
32
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Stare at What You See: Masked Image Modeling without ReconstructionComputer Vision and Pattern Recognition (CVPR), 2022
Hongwei Xue
Shiyang Feng
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
183
38
0
16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style
  Transfer
Improving Speech Emotion Recognition with Unsupervised Speaking Style TransferIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
223
4
0
16 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleComputer Vision and Pattern Recognition (CVPR), 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMCLIP
621
901
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsInterspeech (Interspeech), 2022
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
322
21
0
14 Nov 2022
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for
  Self-Supervised Learning in Earth Observation
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation
Yi Wang
Nassim Ait Ali Braham
Zhitong Xiong
Chenying Liu
C. Albrecht
Xiao Xiang Zhu
233
96
0
13 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
MARLIN: Masked Autoencoder for facial video Representation LearnINgComputer Vision and Pattern Recognition (CVPR), 2022
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViTCVBM
248
93
0
12 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Okapi: Generalising Better by Making Statistical Matches MatchNeural Information Processing Systems (NeurIPS), 2022
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
193
3
0
07 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
378
10
0
02 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the
  Teacher-Student training setup
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setupIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
158
5
0
02 Nov 2022
Deep Multimodal Fusion for Generalizable Person Re-identification
Deep Multimodal Fusion for Generalizable Person Re-identification
Suncheng Xiang
Hao Chen
Jing Gao
Jiawang Mou
Ting Liu
Xiaobo Li
Yuzhuo Fu
308
6
0
02 Nov 2022
More Speaking or More Speakers?
More Speaking or More Speakers?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Dan Berrebbi
R. Collobert
Navdeep Jaitly
Tatiana Likhomanenko
224
6
0
02 Nov 2022
Self-Supervised Learning with Limited Labeled Data for Prostate Cancer
  Detection in High Frequency Ultrasound
Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency UltrasoundIEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2022
P. Wilson
Mahdi Gilany
A. Jamzad
Fahimeh Fooladgar
Minh-Son To
Brian Wodlinger
Purang Abolmaesumi
P. Mousavi
145
19
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for
  improved speech recognition
Speech-text based multi-modal training with bidirectional attention for improved speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
187
9
0
01 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Training Vision-Language Models with Less Bimodal SupervisionConference on Automated Knowledge Base Construction (AKBC), 2022
Elad Segal
Ben Bogin
Jonathan Berant
VLM
125
2
0
01 Nov 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired
  Speech and Text
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and TextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
203
8
0
30 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for
  Automatic Speech Recognition
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2022
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
254
2
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLMSSL
182
43
0
27 Oct 2022
Masked Modeling Duo: Learning Representations by Encouraging Both
  Networks to Model the Input
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the InputIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
398
43
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
AVES: Animal Vocalization Encoder based on Self-SupervisionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Masato Hagiwara
CLIPVLMAI4TS
178
45
0
26 Oct 2022
Learning Explicit Object-Centric Representations with Vision
  Transformers
Learning Explicit Object-Centric Representations with Vision Transformers
Oscar Vikström
Alexander Ilin
OCLViT
215
5
0
25 Oct 2022
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present
  and Future
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future
Guo-Jun Qi
M. Shah
SSL
156
8
0
23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of
  Speech
Evidence of Vocal Tract Articulation in Self-Supervised Learning of SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
204
44
0
21 Oct 2022
Towards Sustainable Self-supervised Learning
Towards Sustainable Self-supervised Learning
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
CLL
354
11
0
20 Oct 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View
  Completion
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View CompletionNeural Information Processing Systems (NeurIPS), 2022
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
373
127
0
19 Oct 2022
A Unified View of Masked Image Modeling
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
242
42
0
19 Oct 2022
Continuous Pseudo-Labeling from the Start
Continuous Pseudo-Labeling from the StartInternational Conference on Learning Representations (ICLR), 2022
Dan Berrebbi
R. Collobert
Samy Bengio
Navdeep Jaitly
Tatiana Likhomanenko
228
17
0
17 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
255
38
0
16 Oct 2022
Improving generalizability of distilled self-supervised speech
  processing models under distorted settings
Improving generalizability of distilled self-supervised speech processing models under distorted settingsSpoken Language Technology Workshop (SLT), 2022
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter-Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
254
15
0
14 Oct 2022
Previous
123...101112139
Next
Page 10 of 13
Pageof 13